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First,  the  asymptotic  covariance  matrix  of  the  variance  component  estimates  is  used  to 
compare  three  common  mating  designs  for  efficiency  (maximizing  the  variance  reducing  property 
of  each  observation)  for  genetic  parameters  across  numbers  of  parents  and  locations  and  varying 
genetic  architectures.  It  is  determined  that  the  circular  mating  design  is  always  superior  in 
efficiency  to  the  half-diallel  design.  For  single-tree  heritability,  the  half-sib  design  is  most 
efficient.  For  estimating  type  B  correlation,  maximum  efficiency  is  achieved  by  either  the  half- 
sib  or  circular  mating  design  and  that  change  in  rank  for  efficiency  is  determined  by  the 
underlying  genetic  architecture. 

Another  intent  of  this  work  is  comparing  analysis  methodologies  for  determining  parental 
worth.  The  first  of  these  investigations  is  ordinary  least  squares  assumptions  in  the  estimation 
of  parental  worth  for  the  half-diallel  mating  design  with  balanced  and  unbalanced  data.  The 
conclusion  from  comparison  of  ordinary  least  squares  to  alternative  analysis  methodologies  is  that 
best  linear  unbiased  prediction  and  best  linear  prediction  are  more  appropriate  to  the  problem  of 
determining  parental  worth. 

viii 


The  next  analysis  investigation  contrasts  variance  component  estimation  techniques  across 
levels  of  imbalance  for  the  half-diallel  and  half-sib  mating  designs  for  the  estimation  of  genetic 
parameters  with  plot  means  and  individuals  used  as  the  unit  of  observation.  The  criteria  for 
discrimination  are  variance  of  the  estimates,  mean  square  error,  bias  and  probability  of  nearness. 
For  all  estimation  techniques  individuals  as  the  unit  of  observation  produced  estimates  with  the 
most  desirable  properties.  Of  the  estimation  techniques  examined,  restricted  maximum  likelihood 
is  the  most  robust  to  imbalance. 

The  computer  program  used  to  produce  restricted  maximum  likelihood  estimates  of 
variance  components  was  modified  to  form  a  user  friendly  analysis  package.  Both  the  algorithm 
and  the  outputs  of  the  program  are  documented.  Outputs  available  from  the  program  include 
variance  component  estimates,  generalized  least  squares  estimates  of  fixed  effects,  asymptotic 
covariance  matrix  for  variance  components,  best  linear  unbiased  predictions  for  general  and 
specific  combining  abilities  and  the  error  covariance  matrix  for  predictions  and  estimates. 
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CHAPTER  1 
INTRODUCTION 


Analysis  of  quantitative  traits  in  forest  genetic  experiments  has  traditionally  been 
approached  as  a  two-part  problem.  Parental  worth  would  be  estimated  as  fixed  effects  and  later 
considered  as  random  effects  for  the  determination  of  genetic  architecture.  While  traditional,  this 
approach  is  most  probably  sub-optimal  given  the  proliferation  of  alternative  analysis  approaches 
with  enhanced  theoretical  properties  (White  and  Hodge  1989). 

In  this  dissertation  emphasis  is  placed  on  the  half-diallel  mating  design  because  of  its 
omnipresence  and  the  uniqueness  of  the  analysis  problem  this  mating  design  presents.  The  half- 
diallel  mating  design  has  been  and  continues  to  be  used  in  plant  sciences  (Sprague  and  Tatum 
1942,  Gilbert  1958,  Matzinger  et  al.  1959,  Burley  et  al.  1966,  Squillace  1973,  Weir  and  Zobel 
1975,  Wilcox  et  al.  1975,  Snyder  and  Namkoong  1978,  Hallauer  and  Miranda  1981,  Singh  and 
Singh  1984,  Greenwood  et  al.  1986,  and  Weir  and  Goddard  1986).  The  unique  feature  of  the 
half-diallel  mating  system  which  hinders  analysis  with  many  statistical  packages  is  that  a  single 
observation  contains  two  levels  of  the  same  main  effect. 

Optimality  of  mating  design  for  the  estimation  of  commonly  needed  genetic  parameters 
(single-tree  heritability,  type  B  correlation  and  dominance  to  additive  variance  ratio)  is  examined 
utilizing  the  asymptotic  covariance  of  the  variance  components  (Kendall  and  Stuart  1963, 
Giesbrecht  1983  and  McCutchan  et  al.  1989).  Since  genetic  field  experiments  are  composed  of 
both  a  mating  design  and  a  field  design,  the  central  consideration  in  this  investigation  is  which 
mating  design  with  what  field  design  (how  many  parents  and  across  what  number  of  locations 


1 


2 
within  a  randomized  complete  block  design)  is  most  efficient.    The  criterion  for  discernment 

among  designs  is  the  efficiency  of  the  individual  observation  in  reducing  the  variance  of  the 

estimate  (Pederson  1972).    This  question  is  considered  under  a  range  of  genetic  architectures 

which  spans  that  reported  for  coniferous  growth  traits  (Campbell  1972,  Stonecypher  et  al.  1973, 

Snyder  and  Namkoong  1978,  Foster  1986,  Foster  and  Bridgwater  1986,  Hodge  and  White  [in 

press]). 

The  investigation  into  optimal  analysis  proceeds  by  considering  the  ordinary  least  squares 
(OLS)  treatment  of  estimating  parental  worth  for  the  half-diallel  mating  design.  OLS  assumptions 
are  examined  in  detail  through  the  use  of  matrix  algebra  for  both  balanced  and  unbalanced  data. 
The  use  of  matrix  algebra  illustrates  both  the  uniqueness  of  the  problem  and  the  interpretation 
of  the  OLS  assumptions.  Comparisons  among  OLS,  generalized  least  squares  (GLS),  best  linear 
unbiased  prediction  (BLUP)  and  best  linear  prediction  (BLP)  are  made  on  a  theoretical  basis. 

Although  consideration  of  field  and  mating  design  of  future  experiments  is  essential,  the 
problem  of  optimal  analysis  of  current  data  remains.  In  response  to  this  need,  simulated  data 
with  differing  levels  of  imbalance,  genetic  architecture  and  mating  design  is  utilized  as  a  basis 
for  discriminating  among  variance  component  estimation  techniques  in  the  determination  of 
genetic  architecture.  The  levels  of  imbalance  simulated  represent  those  commonly  seen  in  forest 
genetic  data  as  less  than  100%  survival,  missing  crosses  for  full-sib  mating  designs  and  only 
subsets  of  parents  in  common  across  location  for  half-sib  mating  designs.  The  two  mating 
designs  are  half-sib  and  half-diallel  with  a  subset  of  the  previously  used  genetic  architectures. 
The  field  design  is  a  randomized  complete  block  with  fifteen  families  per  block  and  six  trees  per 
family  per  block.  The  four  critera  used  to  discriminate  among  variance  component  estimation 
techniques  are  probability  of  nearness  (Pittman  1937),  bias,  variance  of  the  estimates  and  mean 
square  error  (Hogg  and  Craig  1978). 
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The  techniques  compared  for  variance  component  estimation  are  minimum  variance 

quadratic  unbiased  estimation  (Rao  1971b),  minimum  norm  quadratic  unbiased  estimation  (Rao 

1971a),  restricted  maximum  likelihood  (Patterson  and  Thompson  1971),  maximum  likelihood 

(Hartley  and  Rao  1967)  and  Henderson's  method  3  (Henderson  1953).    These  techniques  are 

compared  using  the  individual  and  plot  means  as  the  unit  of  observation.     Further,  three 

alternatives  are  explored  for  dealing  with  negative  variance  component  estimates  which  are  accept 

and  live  with  negative  estimates,  set  negative  estimates  to  zero,  and  re-solve  the  system  setting 

negative  components  to  zero. 

The  algorithm  used  for  the  method  which  provided  estimates  with  optimal  properties 

across  experimental  levels  was  converted  to  a  user  friendly  program.    This  program  providing 

restricted  maximum  likelihood  variance  component  estimates  uses  Giesbrecht's  algorithm  (1983). 

Documentation  of  the  algorithm  and  explanation  of  the  program's  output  are  provided  along  with 

the  Fortran  source  code  (appendix). 


CHAPTER  2 

THE  EFFICIENCY  OF  HALF-SIB,  HALF-DIALLEL 

AND  CIRCULAR  MATING  DESIGNS  IN  THE  ESTIMATION 

OF  GENETIC  PARAMETERS  WITH  VARIABLE  NUMBERS  OF 

PARENTS  AND  LOCATIONS 


Introduction 

In  forest  tree  improvement,  genetic  tests  are  established  for  four  primary  purposes: 
1)  ranking  parents,  2)  selecting  families  or  individuals,  3)  estimating  genetic  parameters,  and  4) 
demonstrating  genetic  gain  (Zobel  and  Talbert  1984).  While  the  four  purposes  are  not  mutually 
exclusive,  a  test  design  optimal  for  one  purpose  is  most  probably  not  optimal  for  all  (Burdon  and 
Shelbourne  1971,  White  1987).  A  breeder  then  must  prioritize  the  purposes  for  which  a  given 
test  is  established  and  choose  a  design  based  on  these  priorities.  Within  a  genetic  test  design 
there  are  two  primary  components:  mating  design  and  field  design.  There  have  been  several 
investigations  of  optimal  designs  for  these  two  components  either  separately  or  simultaneously 
under  various  criteria.  These  criteria  have  included  the  efficient  and/or  precise  estimation  of 
heritability  (Pederson  1972,  Namkoong  and  Roberds  1974,  Pepper  and  Namkoong  1978, 
McCutchan  et  al.  1985,  McCutchan  et  al.  1989),  precise  estimation  of  variance  components 
(Braaten  1965,  Pepper  1983),  and  efficient  selection  of  progeny  (van  Buijtenen  1972,  White  and 
Hodge  1987,  van  Buijtenen  and  Burdon  1990,  Loo-Dinkins  et  al.  1990). 

Incorporated  within  this  body  of  research  has  been  a  wide  range  of  genetic  and 
environmental  variance  parameters  and  field  and  mating  designs.  However,  the  models  in 
previous  investigations  have  been  primarily  constrained  to  consideration  of  testing  in  a  single 
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environment  with  a  corresponding  limited  number  of  factors  in  the  model,  i.e.,  genotype  by 

environment  interaction  and/or  dominance  variance  are  usually  not  considered.  This  chapter 
focuses  on  optimal  mating  designs  through  consideration  of  three  common  mating  designs  (half- 
sib,  half-diallel,  and  circular  with  four  crosses  per  parent)  for  estimation  of  genetic  parameters 
with  a  field  design  extending  across  multiple  locations. 

In  this  chapter  the  approach  to  the  optimal  design  problem  is  to  maintain  the  basic  field 
design  within  locations  as  randomized  complete  block  with  four  blocks  and  a  six-tree  row-plot 
representing  each  genetic  entry  within  a  block  (noted  as  one  of  the  most  common  field  designs 
by  Loo-Dinkins  et  al.  1990).  The  number  of  families  in  a  block,  number  of  locations,  mating 
design  and  number  of  parents  within  a  mating  design  are  allowed  to  change.  Since  optimality, 
besides  being  a  function  of  the  field  and  mating  designs,  is  also  a  function  of  the  underlying 
genetic  parameters,  all  designs  are  examined  across  a  range  of  levels  of  genetic  determination  (as 
varying  levels  of  heritability,  genotype  by  environment  interaction  and  dominance)  reflecting 
estimates  for  many  economically  important  traits  in  conifers  (Campbell  1972,  Stonecypher  et  al. 
1973,  Snyder  and  Namkoong  1978,  Foster  1986,  Foster  and  Bridgwater  1986,  Hodge  and  White 
(in  press)). 

For  each  design  and  level  of  genetic  determination,  a  Minimum  Variance  Quadratic 
Unbiased  Estimation  (MIVQUE)  technique  and  an  approximation  of  the  variance  of  a  ratio 
(Kendall  and  Stuart  1963,  Giesbrecht  1983  and  McCutchan  et  al.  1989)  are  applied  to  estimate 
the  variance  of  estimates  of  heritability,  additive  to  additive  plus  additive  by  environment  variance 
ratio,  and  dominance  to  additive  variance  ratio.  These  techniques  use  the  true  covariance  matrix 
of  the  variance  component  estimates  (utilizing  only  the  known  parameters  and  the  test  design  and 
precluding  the  need  for  simulated  or  real  data)  and  a  Taylor  series  approximation  of  the  variance 
of  a  ratio.   The  relative  efficiencies  of  different  test  designs  are  compared  on  the  basis  of  i  (the 
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efficiency  of  an  individual  observation  in  reducing  the  variance  of  an  estimate,  Pederson  1972). 
Thus  this  research  explores  which  mating  design,  number  of  parents  and  number  of  locations  is 
most  efficient  per  unit  of  observation  in  estimating  heritability,  additive  to  additive  plus  additive 
by  environment  variance  ratio,  and  dominance  to  additive  variance  ratio  for  several  variance 
structures  representative  of  coniferous  growth  traits. 

Methods 

Assumptions  Concerning  Block  Size 

As  opposed  to  McCutchan  et  al.  (1985),  where  block  sizes  were  held  constant  and 
including  more  families  resulted  in  fewer  observations  per  family  per  block,  in  this  chapter  the 
blocks  are  allowed  to  expand  to  accomodate  increasing  numbers  of  families.  This  expansion  is 
allowed  without  increasing  either  the  variance  among  block  or  the  variance  within  blocks.  For 
the  three  mating  designs  which  are  discussed,  the  addition  of  one  parent  to  the  half-sib  design 
increases  block  size  by  6  trees  (plot  for  a  half-sib  family),  the  addition  of  a  parent  to  the  circular 
design  increases  block  size  by  12  trees  (two  plots  for  full-sib  families),  and  the  addition  of  a 
parent  to  the  half-diallel  design  increases  block  size  by  6p  (where  p  is  the  number  of  parents 
before  the  addition  or  there  are  p  new  full-sib  families  per  block).  Therefore,  block  size  is 
determined  by  the  mating  design  and  the  number  of  parents. 

All  comparisons  among  mating  designs  and  numbers  of  locations  are  for  equal  block 
sizes,  i.e.,  equal  numbers  of  observations  per  location.  This  results  in  comparing  mating  designs 
with  unequal  numbers  of  parents  in  the  designs  and  comparing  two  location  experiments  against 
five  location  experiments  with  equal  numbers  of  observations  per  location  but  unequal  total 
numbers  of  observations. 


The  Use  of  Efficiency  (/) 

Efficiency  is  the  tool  by  which  comparisons  are  made  and  is  the  efficacy  of  the  individual 
observations  in  an  experiment  in  lowering  the  variance  of  parameter  estimates.  An  increasing 
efficiency  indicates  that  for  increasing  experimental  size  the  additional  observations  have 
enhanced  the  variance  reducing  property  of  all  observations.  Efficiency  is  calculated  as  i  =  1 
/  N(Var(x))  where  N  is  the  total  number  of  observations  and  Var(x)  is  the  variance  of  a  generic 
parameter  estimate.  Increasing  N  always  results  in  a  reduction  of  the  variance  of  estimation,  all 
other  things  being  equal.  Yet  the  change  in  efficiency  with  increasing  N  is  dependent  on  whether 
the  reduction  in  variance  is  adequate  to  offset  the  increase  in  N  which  caused  the  reduction. 
Comparing  a  previous  efficiency  with  that  obtained  by  increasing  N,  i.e.,  increasing  the  number 
of  parents  in  a  mating  design  or  increasing  the  number  of  locations  in  which  an  experiment  is 
planted: 

since  *'„=!/  N(Var(x)),  2-1 

then  N(Var(x))  =  1  /  i0 

and  (N  +  AN)(Var(x)  +  AVar(x))  =  1  /  /„; 

if  i„  (the  old  efficiency)  =  i„  (the  new  efficiency), 

then  AVar(x)  /  Var(x)    =  -  AN  /  (N  +  AN); 

if  i0  <  in,  then  AVar(x)  /  Var(x)  <  -  AN  /  (N  +  AN); 

and  if    i0  >  i„,  then  AVar(x)  /  Var(x)  >  -  AN  /  (N  +  AN); 

where  A  denotes  the  change  in  magnitude. 
Viewing  equation  2-1,  if  N  is  held  constant  and  one  design  has  a  higher  efficiency  (/),  the  design 
must  also  produce  parameter  estimates  which  have  a  lower  variance. 


General  Methodology 

Sets  of  true  variance  components  are  calculated  in  accordance  with  a  stated  level  of 
genetic  control  and  the  design  matrix  is  generated  in  correspondence  with  the  field  and  mating 
design.  Knowing  the  design  matrix  and  a  set  of  true  variance  components,  a  true  covariance 
(covariance)  matrix  of  variance  component  estimates  is  generated.  Once  the  covariance  matrix 
of  the  variance  components  is  in  hand,  the  variance  of  and  covariances  between  any  linear 
combinations  of  the  variance  component  estimates  are  calculated.  From  the  covariance  matrix 
for  linear  combinations,  the  variance  of  genetic  ratios  as  ratios  of  linear  combinations  of  variance 
components  are  approximated  by  a  Taylor  series  expansion.  Since  definition  of  a  set  of  variance 
components  and  formation  of  the  design  matrix  are  dependent  on  the  linear  model  employed, 
discussion  of  specific  methodology  begins  with  linear  models. 

Linear  Models 

Half-diallel  and  circular  designs 

The  scalar  linear  model  employed  for  half-diallel  and  circular  mating  designs  is 

y,jkim  =    M  +  ti  +  b5  +  gk  +  g,  +  sw  +  tgfr  +  tga  +  &„  +  pijkl  +  wijklm  2-2 

where    yijklm  is  the  m-  observation  of  the  kl-  cross  in  the  j-  block  of  the  i-  test; 

li  is  the  population  mean; 

t;  is  the  random  variable  test  environment  ~  NID(0,(r\); 

bjj  is  the  random  variable  block  ~  NIDCO,^); 

gk  is  the  random  variable  female  general  combining  ability  (gca)  ~  NID(0,ff2gcJ; 

g,  is  the  random  variable  male  gca  ~  NID(0,<r2gca); 

Sy  is  the  random  variable  specific  combining  ability  (sea)  ~  NIDCO.a2^); 

tgk  is  the  random  variable  test  by  female  gca  interaction  ~  NIDCO,^); 
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tgu  is  the  random  variable  test  by  male  gca  interaction  ~  NID^a2,^; 

tsM  is  the  random  variable  test  by  sea  interaction  ~  NID(0,a2J; 

pijkl  is  the  random  variable  plot  ~  NIDCO.o2,,); 

wijklm  is  the  random  variable  within  plot  ~  NID(0,a2J;  and 

there  is  no  covariance  between  random  variables  in  the  model. 

This  linear  model  in  matrix  notation  is  (dimensions  below  model  component): 

y  =  fil  +  ZTer  +  ZBeB  +  ZGeG  +    Zses  +    ^ig^ig  ~*~    ^ts*^  ~*~    ZPeP  +  ty,      2-3 
nxl      nxl   rut  del   rub  bxl   rug  gxl   ius  sjcI   nxtg  tgxl   ruts  t&xl   iup  pjcl   nxl 
where    y  is  the  observation  vector; 

Zj  is  the  portion  of  the  design  matrix  for  the  i—  random  variable; 

e,  is  the  vector  of  unobservable  random  effects  for  the  i-  random  variable; 

1  is  a  vector  of  l's;  and 

n,  t,  b,  g,  s,  tg,  ts,  and  p  are  the  number  of  observations,  tests,  blocks,  gca's,  sea's, 

test  by  gca  interactions,  test  by  sea  interactions  and  plots,  respectively. 
Utilizing  customary  assumptions  in  half-diallel  mating  designs  (Method  4,  Griffing  1956),  the 
variance  of  an  individual  observation  is 

Var(yijkJ  =  o*  +  <r\  +  2<r\„  +  a2^  +  2c2,,  +  o2*  +  o%  +  (?v;         2-4 
and  in  matrix  notation  the  covariance  matrix  for  the  observations  is 

Vai<y)  =  ZrZy,  +  ZBZ£o\  +  Z^o2^  +  ZgZJo2.,  +  ^Z^o2^  +  J^ZJXu  +  ZPZyp  +  I.o2,,  2-5 

where  "  '  "  indicates  the  transpose  operator,  all  matrices  of  the  form  ZjZ,'  are  nxn,  and  In  is  an 
nxn  identity  matrix. 
Half-sib  design 

The  scalar  linear  model  for  half-sib  mating  designs  is 

Yijlon    =    M    +    ti    +    by    +    gk    +    tfe    +    P*ijk    +    W*ijkm  2"6 
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where    yijkni  is  the  m-  observation  of  the  k-  half-sib  family  in  the  j-  block  of  the  i-  test; 
/*>  ti,  bij,  g,,,  and  tgj.  retain  the  definition  in  Eq.2-2; 
p"ijk    is   the   random   variable   plot   containing   different   genotype  by   environment 

components  than  Eq.2-2  ~  NIDCO,^,); 
w*ijkin  is  the  random  variable  within  plot  containing  different  levels  of  genotypic  and 

genotype  by  environment  components  than  Eq.2-2  ~  NID^aV);  and 
there  is  no  covariance  between  random  variables  in  the  model. 
The  matrix  notation  model  is 

nxl      nxl   tut  Del    rub  hxl    njcg  gxl    nxtg  tgxl    rap  pxl    nxl 
The  variance  of  an  individual  observation  in  half-sib  designs  is 

Var(yijkJ  =  a\  +  <rb  +  0%  +  a\  +  cfp.  +  o2^;  2-8 

and        Var(y)  =  Z^o2,  +  Z^cr^  +  ZGZ£a2gca  +  Z^^a2,,  +  ZPZp'a2p.  +  lna2^      2-9 

Levels  of  Genetic  Determination 

Eight  levels  of  genetic  determination  are  derived  from  a  factorial  combination  of  two 
levels  of  each  of  three  genetic  ratios:  heritability  (h2  =  4orgca  /  (2a2gca  +  a2^  +  2^  +  a2,,  + 
ff2P  +  o2*)  for  full-sib  models  and  h2  =  4^^  /  (a2gca  +  a2^  +  o^,  +  a2^)  for  half-sib  models); 
additive  to  additive  plus  additive  by  environment  variance  ratio  (rB  =  a2^  I  (cr2gca  +  o2^),  Type 
B  correlation  of  Burdon  1977);  and  dominance  to  additive  variance  ratio  (7  =  a2^  /  a2gcJ.  The 
levels  employed  for  each  ratio  are  h2  -  0.1  and  0.25;  rB  =  0.5  and  0.8;  and  y  =  0.25  and  1.0. 

To  generate  sets  of  true  variance  components  (Table  2-1)  for  half-diallel  and  circular 
mating  designs  from  the  factorial  combinations  of  genetic  parameters,  the  denominator  of  h2  is 
set  to  10  (arbitrarily,  but  without  loss  of  generality)  which,  given  the  level  of  h2,  leads  to  the 
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solution  for  o2^.   Solving  for  a2gca  and  knowing  7  yields  the  value  for  a2^.   Knowing  the  level 

of  rB  and  a2^  allows  the  equation  for  rB  to  be  solved  for  a2^.   An  assumption  that  the  ratio  of  7 

Table  2-1.  Parametric  variance  components  for  the  factorial  combination  of  heritability  (.1  and 
.25),  Type  B  Correlation  (.5  and  .8)  and  dominance  to  additive  variance  ratio  (.25  and  1.0)  for 
full  and  half-sib  designs,  a2,  and  a\  were  maintained  at  1 .0  and  .5,  respectively  for  all  levels  and 
designs. 


Design 

Level 

h2 

*8 

7 

H* 

<?*. 

°\ 

a2. 

^ 

< 

Full 

1 

.1 

8 

1.0 

.2500 

.2500 

.0625 

.0625 

.6344 

8.4281 

2 

.1 

5 

1.0 

.2500 

.2500 

.2500 

.2500 

.5950 

7.9050 

3 

.1 

8 

.25 

.2500 

.0625 

.0625 

.0156 

.6508 

8.6461 

4 

.1 

5 

.25 

.2500 

.0625 

.2500 

.0625 

.6212 

8.2538 

5 

.25 

8 

1.0 

.6250 

.6250 

.1562 

.1562 

.5359 

7.1203 

6 

.25 

5 

1.0 

.6250 

.6250 

.6250 

.6250 

.4376 

5.8125 

7 

.25 

8 

.25 

.6250 

.1562 

.1562 

.0391 

.5769 

7.6649 

8 

.25 

5 

.25 

.6250 

.1562 

.6250 

.1562 

.5031 

6.6844 

Half 

1  and 
3 

.1 

8 

.2500 

.0625 

.4844 

9.2031 

2  and 
4 

.1 

.5 

.2500 

.2500 

.4750 

9.0250 

S  and 

7 

.25 

.8 

.6250 

.1562 

.4609 

8.7579 

6  and 
8 

.25 

.5 

.6250 

.6250 

.4375 

8.3125 

equals  the  ratio  of  a2^  I  o\  permits  a  solution  for  a2te.  A  further  assumption  that  o\  is  seven 
percent  of  o2,  +  a2w  yields  a  solution  for  both  a\  and  <t2w.  Finally,  a\  and  a\  are  set  to  1.0  and 
0.5,  respectively,  for  all  treatment  levels. 

In  order  to  facilitate  comparisons  of  half-sib  mating  designs  with  full-sib  mating  designs, 
a2gca  and  a\  retain  the  same  values  for  given  levels  of  h2  and  rB  and  the  denominator  of 
heritability  again  is  set  to  10.  To  solve  for  <rp.  and  a2^,,  the  assumption  that  a2p.  is  five  percent 
of  a2,,.  +  a2^  permits  a  solution  for  a2p,  and  a2^  and  maintains  <rp.  approximately  equal  to  and 
no  larger  than  <rp  of  the  full-sib  mating  designs  (Namkoong  et  al.  1966)  for  the  same  levels  of 
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h2  and  rB.  Under  the  previous  definitions  all  consideration  of  differences  in  7  changing  the 
magnitudes  of  a2p.  and  o2^  is  disallowed.  Thus,  there  are  only  four  parameter  sets  for  the  half- 
sib  mating  design  (Table  2-1). 

Covariance  Matrix  for  Variance  Components 

The  base  algorithm  to  produce  the  covariance  matrix  for  variance  component  estimates 
is  from  Giesbrecht  (1983)  and  was  rewritten  in  Fortran  for  ease  of  handling  the  study  data.  In 
using  this  algorithm,  we  assume  that  all  random  variables  are  independent  and  normally 
distributed  and  that  the  true  variances  of  the  random  variables  are  known.  Under  these 
assumptions,  Minimum  Norm  Quadratic  Unbiased  Estimation  (MINQUE,  Rao  1972)  using  the 
true  variance  components  as  priors  (the  starting  point  for  the  algorithm)  becomes  MIVQUE  (Rao 
1971b),  which  requires  normality  and  the  true  variance  components  as  priors  (Searle  1987),  and 
for  a  given  design  the  covariance  matrix  of  the  variance  component  estimates  becomes  fixed.  A 
sketch  of  the  steps  from  the  MIVQUE  equation  (Eq.2-10,  Giesbrecht  1983,  Searle  1987)  to  the 
true  covariance  matrix  for  variance  components  estimates  is 

{tr(QViQVj)}a2  =  {y'QV.Qy}  2-10 

rjcr        rjcl  rjcl 

then  P  =  {tr(QV,QVJ)}-,{y'QV,Qy} 

and  Var(^)  =  {tr(QViQVj)}1Var({y'QViQy}){tr(QViQVj)}1 

TXT  TXT  TXT  TXT 

where  {a^}  is  a  matrix  whose  elements  are  aSj  where  in  the  full-sib  designs  i  =  1 
to  8  and  j=l  to  8,  i.e.,  there  is  a  row  and  column  for  every  random 
variable  in  the  linear  model; 
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tr   is  the  trace  operator  that  is  the  sum  of  the  diagonal   elements  of  a 

matrix; 

Q  =  V  -  V'XCX'V'XyX'V-'   for  V  =  the  covariance  matrix  of  y  and  X  as 
the  design  matrix  for  fixed  effects; 

V,  =  ZjZ',  where  i  =   the  random  variables  test,  block,  etc.; 

a2  is  the  vector  of  variance  component  estimates;  and 

r  is  the  number  of  random  variables  in  the  model. 
The  variance  of  a  quadratic  form  (where  A  is  any  non-negative  definite  matrix  of  proper 
dimension)  under  normality  is  Var(y'Ay)   =  2tr(AVAV)   +  /t'A/i  (Searle  1987);  however, 
MINQUE  derivation  (Rao  1971)  requires  that  AX  =  0  which  in  our  case  is  Al  =0  and  is 
equivalent  to  /xl'Al/i  =  0,  thus 

Var({y'QViQy})  =  2{tr(QViQVj)};  2-11 

and  using  Eq.2-10  and  Eq.2-11    Var(^)  =  {tr(QVjQVj)}12{tr(QViQVj)}{tr(QViQVj)}1 
and       therefore  Var(^)  =  Vvc  =  2{tr(QViQVj)}1.  2-12 

From  Eq.2-12  it  is  seen  that  the  MIVQUE  covariance  matrix  of  the  variance  component  estimates 
is  dependent  only  on  the  design  matrix  (the  result  of  the  field  design  and  mating  design)  and  the 
true  variance  components;  a  data  vector  is  not  needed. 

Covariance  Matrix  for  Linear  Combinations  of  Variance  Components  and  Variance  of  a  Ratio 

Once  the  covariance  matrix  for  the  variance  component  estimates  (Eq.2-12)  is  created, 
then  the  covariance  matrix  of  linear  combinations  of  these  variance  components  is  formed  as 

Vk  =  L'VVCL  2-13 

2x2     2XT  TXT  TXl 
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where  L  specifies  the  linear  combinations  of  the  variance  components  which  are  the 
combinations  of  variance  components  in  the  denominator  and  numerator  of  the  genetic  ratio  being 
estimated.  A  Taylor  series  expansion  (first  approximation)  for  the  variance  of  a  ratio  using  the 
variances  of  and  covariance  between  numerator  and  denominator  is  then  applied  using  the 
elements  of  V,,.  to  produce  the  approximate  variance  of  the  three  ratio  estimates  as  (Kendall  and 
Stuart  1963): 

Var(ratio)  s  (l/D)2CVte(l,l))  -  2(N/D3)(Vlc(l,2))  +  (mD')(\k(2,2))  2-14 

where   the  generic  ratio  is  N/D  and  N  and  D  are  the  parametric  values; 

Vfc(  1,1)  is  the  variance  of  N; 

Vk(l,2)  is  the  covariance  between  N  and  D;  and 

Vk(2,2)  is  the  variance  of  D. 

Comparison  Among  Estimates  of  Variances  of  Ratios 

The  approximate  variances  of  the  three  ratio  estimates  (h2,  rB,  and  7)  are  compared  across 
mating  designs  with  equal  (or  approximately  equal)  numbers  of  observations,  across  numbers  of 
locations,  and  across  numbers  of  parents  within  a  mating  design  all  within  a  level  of  genetic 
determination.  The  standard  for  comparison  is  i.  Results  are  presented  by  the  genetic  ratio 
estimated  so  that  direct  comparisons  may  be  made  among  the  mating  designs  for  equal  numbers 
of  observations  within  a  number  of  locations  for  varying  levels  of  genetic  control.  Number  of 
genetic  entries  (number  of  crosses  for  full-sib  designs  and  number  of  half-sib  families  for  half-sib 
designs)  is  used  as  a  proxy  for  number  of  observations  since,  for  all  designs,  number  of 
observations  equals  twenty-four  times  the  number  of  locations  times  the  number  of  genetic 
entries.     Further,  by  plotting  the  two  levels  of  numbers  of  locations  on  a  single  figure,  a 
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comparison  is  made  of  the  utility  of  replication  of  a  design  across  increasing  numbers  of 

locations. 

Efficiency  plots  also  permit  contrasts  of  the  absolute  magnitude  of  variance  of  estimation 
among  designs.  For  a  given  number  of  genetic  entries  and  locations,  the  design  with  the  highest 
efficiency  is  the  most  precise  (lowest  variance  of  estimation).  Increasing  the  number  of  genetic 
entries  or  locations  always  results  in  greater  precision  (lower  variance  of  estimation),  but  is  not 
necessarily  as  efficient  (the  reduction  in  variance  was  not  sufficient  to  offset  the  increase  in 
numbers  of  observations).  A  primary  justification  for  using  the  efficiency  of  a  design  as  a 
criterion  is  that  a  more  precise  estimate  of  a  genetic  ratio  is  obtained  by  using  the  mean  of  two 
estimates  from  replication  of  the  small  design  as  two  disconnected  experiments  as  opposed  to  the 
estimate  from  single  large  design.  This  is  true  when  1)  the  number  of  observations  in  the  large 
design  (N)  equals  twice  the  number  of  observations  in  small  design  (nj,  2)  the  small  design  is 
more  efficient,  and  3)  the  variances  are  homogeneous.   This  is  proven  below: 

Since  N  =  n,  +  n2 

and  iij  =  n2 

then  N  =  2n,. 

By  definition  i  =  \  I  (N*(Var(Ratio))); 

and  Var(Ratio)  =  1  /(i*N). 

The  proposition  is  (Vars(Ratio)  +  VarB(Ratio))/4.0  <  Var,(Ratio); 

substitution  gives  ((l/(n,*0)  +  (l/(n,*/s)))/4.0  <  (l/(N*i,)). 

Simplification  yields  (l/(2.0*n,*/6))  <  (l/(N*i,)); 

and  multiplication  by  N  produces  1  //s  <  1//,  2-15 

which  is  strictly  true  so  long  as  i,  >  i,  where  /'„  is  the  efficiency  of  the  smaller  experiment  and 
i,  is  the  efficiency  of  the  larger  experiment. 
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Results 

Heritability 

Half-sib  designs  are  almost  globally  superior  to  the  two  full-sib  designs  in  precision  of 
heritability  estimates  (results  not  shown  for  variance  but  may  be  seen  from  efficiencies  in  Figure 
2-1).  For  designs  of  equal  size,  half-sib  designs  excel  with  the  exception  of  genetic  level  three 
(Figure  2-lc,  h2  =  0.1,  rB  =  0.8,  and  y  =  0.25).  In  genetic  level  three,  the  circular  design 
provides  the  most  precise  estimate  of  h2  for  two  location  designs;  however,  when  the  design  is 
extended  across  five  locations,  the  half-sib  mating  design  again  provides  the  most  precise 
estimates.  The  circular  mating  design  is  superior  in  precision  to  the  half-diallel  design  across  all 
levels  of  genetic  control  and  location,  even  with  a  relatively  large  number  of  crosses  per  parent 
(four). 

Half-sib  designs  are,  in  general,  (seven  genetic  control  levels  out  of  eight,  Figure  2-1) 
more  efficient  with  the  exception  of  level  three  across  two  locations  (Figure  2-lc).  For  the 
circular  and  half-sib  mating  designs  considered,  increasing  the  number  of  genetic  entries  always 
improves  the  efficiency  of  the  design.  However,  definite  optima  exist  for  the  half-diallel  mating 
design  for  number  of  genetic  entries,  i.e.,  crosses  which  convert  to  a  specific  number  of  parents. 
These  optima  are  not  constant  but  tend  to  be  six  parents  or  less,  lower  with  increasing  h2  or 
number  of  locations.  The  six-parent  half-diallel  is  never  far  from  the  half-diallel  optima,  and 
increasing  the  number  of  parents  past  the  optimum  results  in  decreased  efficiency. 

For  half-sib  designs  with  h2  =  0.1,  five  locations  are  more  efficient  than  two  locations; 
however,  at  h2  =  0.25  two  locations  are  most  efficient.  Further,  the  number  of  locations 
required  to  efficiently  estimate  h2  for  half-sib  designs  is  determined  only  by  the  level  of  h2  and 
does  not  depend  on  the  levels  of  the  other  ratios.    Although  estimates  over  larger  numbers  of 
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observations  are  more  precise  (five-location  estimates  are  more  precise  than  two-location 

estimates),  the  efficiency  (increase  in  precision  per  unit  observation)  declines.  So  that  if  h2  = 
0.25  and  estimates  of  a  certain  precision  are  required,  disconnected  sets  of  two-location 
experiments  are  preferred  to  five-location  experiments.  The  relative  efficiencies  of  five  locations 
versus  two  locations  is  enhanced  with  decreasing  rB  (increasing  genotype  by  environment 
interaction)  within  a  level  of  h2  (compare  Figures  2- la  to  2- lb  and  2- lc  to  2- Id  for  h2  =  0. 1,  and 
2-le  to  2-lf  and  2-lg  to  2-lh  for  h2  =  0.25).  Yet,  this  enhancement  is  not  sufficient  to  cause 
a  change  in  efficiency  ranking  between  the  location  levels. 

The  full-sib  designs  differ  markedly  from  this  pattern  (Figure  2-1)  in  that,  for  these 
parameter  levels,  it  is  never  more  efficient  to  increase  the  number  of  locations  from  two  to  five 
for  heritability  estimation.  As  observed  with  half-sib  designs,  for  full-sib  designs  the  relative 
efficiency  status  of  five  locations  improves  with  decreasing  rB.  To  further  contrast  mating  designs 
note  that  the  efficiency  status  of  full-sib  designs  relative  to  the  half-sib  design  improves  with 
decreasing  7  and  increasing  rB  (Figures  2-lb  versus  2-lc  and  2-lf  versus  2-lg). 

Type  B  Correlation 

As  opposed  to  h2  estimation,  no  mating  design  performs  at  or  near  the  optima  for 
precision  of  rB  estimates  across  all  levels  of  genetic  control  (Figure  2-2).  However,  the  circular 
mating  designs  produce  globally  more  precise  estimates  than  those  of  the  half-diallel  mating 
design.  In  general,  the  utility  of  full-sib  versus  half-sib  designs  is  dependent  on  the  level  of  rB. 
The  lower  rB  value  favors  half-sib  designs  while  the  higher  rB  tends  to  favor  full-sib  designs 
(compare  Figures  2-2a  to  2-2b,  2-2c  to  2-2d,  2-2e  to  2-2f  and  2-2g  to  2-2h). 
Decreasing  7  and  lowering  h2  always  improves  the  relative  efficiency  of  full-sib  designs  to  half- 
sib  designs  (compare  Figures  2-2c  and  2-2d  to  2-2e  and  2-2f). 
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Figure  2-3.  Efficiency  (/)  for  y  plotted  against  number  of  genetic  entries  for  four  levels  for 
genetic  control  for  circular,  half-diallel,  and  half-sib  mating  designs  across  levels  of  location 
where  i  =  l/(N(Var(7)))  and  N  =  the  total  number  of  observations. 
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For  estimation  of  rB,  full-sib  designs  are  more  efficient  than  half-sib  designs  except  in  the 

three  cases  of  low  rB  (0.5)  and  high  7  (1.0)  for  h2  =  0.1  (Figure  2-2b)  and  low  rB  for  h2  =  0.25 

(Figures  2-2f  and  2-2h).    Within  full-sib  designs  the  circular  design  is  globally  superior  to  the 

half-diallel.     As  with  h2  estimation,  half-diallel  designs  have  optimal  levels  for  numbers  of 

parents.    The  six-parent  half-diallel  is  again  close  to  these  optima  for  all  genetic  levels  and 

numbers  of  locations. 

At  low  h2  for  full-sib  designs,  planting  in  two  locations  is  always  more  efficient  than  five 

locations.   For  half-sib  designs  at  low  h2,  the  relative  efficiency  of  two  versus  five  locations  is 

dependent  on  the  level  of  rB  with  lower  rB  favoring  replication  across  more  locations.   At  h2  = 

0.25,  half-sib  designs  are  more  efficient  when  replicated  across  five  locations.  At  the  higher  h2 

value,  full-sib  design  efficiency  across  locations  is  dependent  on  the  level  of  rB.   With  rB  =  0.5 

and  h2  =  0.25,  replication  of  full-sib  designs  is  for  the  first  time  more  efficient  across  five 

locations  than  across  two  locations;  however,  at  the  higher  rB  level  two  locations  is  again  the 

preferred  number. 

Dominance  to  Additive  Variance  Ratio 

In  comparing  the  two  full-sib  designs  for  relative  efficiency  in  estimating  7,  the  circular 
design  is  always  approximately  equal  to  or,  for  most  cases,  superior  to  the  half-diallel  design 
(Figure  2-3).  The  relative  superiority  of  the  circular  design  is  enhanced  by  decreasing  7  and  rB 
(not  shown).  The  half-diallel  design  again  demonstrates  optima  for  number  of  parents  with  the 
six-parent  design  being  near  optimal.  Within  a  mating  design  the  use  of  two  locations  is  always 
more  efficient  than  the  use  of  five  locations.  The  magnitude  of  this  superiority  escalates  with 
increasing  rB  and  h2  (Figures  2-3a  and  2-3b  versus  2-3c  and  2-3d). 
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Discussion 

Comparison  of  Mating  Designs 

A  priori  knowledge  of  genetic  control  is  required  to  choose  the  optimal  mating  and  field 
design  for  estimation  of  h2,  rB  and  7.  Given  that  such  knowledge  may  not  be  available,  the 
choices  are  then  based  on  the  most  robust  mating  designs  and  field  designs  for  the  estimation  of 
certain  of  the  genetic  ratios.  If  h2  is  the  only  ratio  desired,  then  the  half-sib  mating  design  is 
best.  Estimation  of  both  h2  and  rB  requires  a  choice  between  the  half-sib  and  circular  designs. 
If  there  is  no  prior  knowledge  then  the  selection  of  a  mating  design  is  dependent  on  which  ratio 
has  the  highest  priority.  For  experiments  in  which  h2  received  highest  weighting,  the  half-sib 
design  is  preferred  and  in  the  alternative  case  the  circular  design  is  the  better  choice.  In  the  last 
scenario  information  on  all  three  ratios  is  desired  from  the  same  experiment  and  in  this  case  the 
circular  design  is  the  better  selection  since  the  circular  design  is  almost  globally  more  efficient 
than  the  half-diallel  design. 

After  choosing  a  mating  design,  the  next  decision  is  how  many  locations  per  experiment 
are  required  to  optimize  efficiency.  For  the  half-sib  design  the  number  of  locations  required  to 
optimize  efficiency  is  dependent  on  both  the  ratio  being  estimated  and  the  level  of  genetic  control. 
A  broad  inference  is  that  for  h2  estimation  a  two  location  experiment  is  more  efficient  and  for  rB 
a  five  location  experiment  has  the  better  efficiency.  Estimation  of  any  of  the  three  ratios  with 
a  full-sib  design  is  almost  globally  more  efficient  in  two  location  experiments. 

The  disparity  between  the  behavior  of  the  half-sib  and  full-sib  designs  with  respect  to  the 
efficiency  of  location  levels  can  be  explained  in  terms  of  the  genetic  connectedness  offered  by  the 
different  designs.  Genetic  connectedness  can  be  viewed  as  commonality  of  parentage  among 
genetic  entries.   The  more  entries  having  a  common  parent  the  more  connectedness  is  present. 
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The  half-sib  design  is  only  connected  across  locations  by  the  one  common  parent  in  a  half-sib 

family  in  each  replication.  Full-sib  designs  are  connected  across  locations  in  each  replication  by 

the  full-sib  cross  plus  the  number  of  parents  minus  two  (half-diallel)  or  three  (circular)  for  each 

of  the  two  parents  in  a  cross.   The  connectedness  in  a  full-sib  design  means  each  observation  is 

providing  information  about  many  other  observations.   The  result  of  this  connectedness  is  that, 

in  general,  fewer  observations  (number  of  locations)  are  required  for  maximum  efficiency. 

A  General  Approach  to  the  Estimation  Problem 

The  estimation  problems  may  be  viewed  in  a  broader  context  than  the  specific  solutions 
in  this  chapter.  The  technique  for  comparison  of  mating  designs  and  numbers  of  locations  across 
levels  of  genetic  determination  may  be  construed,  for  the  case  of  h2  estimation,  to  be  the  effect 
of  these  factors  on  the  variance  of  a2^  estimates.  Viewing  the  variance  approximation  formula, 
the  conclusion  may  be  reached  that  the  variance  of  a2^  estimates  is  the  controlling  factor  in  the 
variance  of  h2  estimates  since  the  other  factors  at  these  heritability  levels  are  multiplied  by 
constants  which  reduce  their  impact  dramatically.  Given  this  conclusion,  the  variance  of  h2 
estimates  is  essentially  the  (3,3)  element  in  2{tr(QViQVj)}"1  (Eq.  2-11).  Further,  since  the 
covariances  of  the  other  variance  component  estimates  with  o2^  estimates  are  small,  the  variance 
of  a2^  estimates  is  basically  determined  by  the  magnitude  of  the  (3,3)  element  of  {tr(QVjQVj)} 
which  is  tr(QVgQVg).  Thus,  the  variance  of  h2  estimates  is  minimized  by  maximizing 
tr(QVgQVg)  with  h2  used  as  an  illustration  because  this  simplification  is  possible. 

Considering  the  impact  of  changing  levels  of  genetic  control,  while  holding  the  mating 
and  field  designs  constant,  Vg  is  fixed,  the  diagonal  elements  of  V  are  fixed  at  1 1.5  because  of 
our  assumptions,  and  only  the  off-diagonal  elements  of  V  change  with  genetic  control  levels. 
Since  Q  is  a  direct  function  of  V',  what  we  observe  in  Figure  2-1  comparing  a  design  across 
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levels  of  genetic  control  are  changes  in  V'1  brought  about  by  changes  in  the  magnitude  of  the  off- 
diagonal  elements  of  V  (covariances  among  observations).  The  effect  of  positive  (the  linear 
model  specifies  that  all  off-diagonal  elements  in  V  are  zero  or  positive)  off-diagonal  elements  on 
V"1  is  to  reduce  the  magnitude  of  the  diagonal  elements  and  often  also  result  in  negative  off- 
diagonal  elements.  If  one  increases  the  magnitude  of  the  off-diagonal  elements  in  V,  then  the 
magnitude  of  the  diagonal  elements  of  V"1  is  reduced  and  the  magnitude  of  negative  off-diagonal 
elements  is  increased.  Since  tr(QVgQVg)  is  the  sum  of  the  squared  elements  of  the  product  of 
a  direct  function  of  V"1  and  a  matrix  of  non-negative  constants  (Vg),  as  the  diagonal  elements  of 
V'1  are  reduced  and  the  off-diagonal  elements  become  more  negative,  tr(QVgQVg)  must  become 
smaller  and  the  variance  of  h2  estimates  increases. 

Mating  designs  may  be  compared  by  the  same  type  of  reasoning.  Within  a  constant  field 
design  changes  in  mating  design  produce  alterations  in  V.  Of  the  three  designs  the  half-sib 
produces  a  V  matrix  with  the  most  zero  off-diagonal  elements,  the  circular  design  next,  and  the 
half-diallel  the  fewest  number  of  zero  off-diagonal  elements.  Knowing  the  effect  of  off-diagonal 
elements  on  the  variance  of  h2  estimates,  one  could  surmise  that  the  variance  of  estimates  is 
reduced  in  the  order  of  least  to  most  non-zero  off-diagonal  elements.  This  tenant  is  in  basic 
agreement  with  the  results  in  Figures  2-1  through  2-3. 

The  effects  of  rB  and  y  on  the  variance  of  h2  estimates  can  also  be  interpreted  utilizing 
the  above  approach.  In  the  results  section  of  this  chapter  it  is  noted  that  decreasing  the  magnitude 
of  rB  and/or  y  causes  full-sib  designs  to  rise  in  efficiency  relative  to  the  half-sib  design.  In 
accordance  with  our  previous  arguments  this  would  be  expected  since  decreasing  the  magnitude 
of  those  two  ratios  causes  a  decrease  in  the  magnitude  of  off-diagonal  elements.  More  precisely, 
decreasing  7  results  in  the  reduction  of  off-diagonal  elements  in  V  of  the  full-sib  designs  while 
not  affecting  the  half-sib  design,  and  decreasing  rB  results  in  the  reduction  of  off-diagonal 
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elements  in  V  of  full-sib  and  half-sib  designs.  Relative  increases  in  efficiency  of  full-sib  designs 

result  from  the  elements  due  to  location  by  additive  interaction  occurring  much  less  frequently 

in  the  half-sib  designs;  thus,  the  relative  impact  of  reduction  in  rB  in  half-sib  designs  is  less  than 

that  for  full-sibs. 

Use  of  the  Variance  of  a  Ratio  Approximation 

Use  of  Kendall  and  Stuart's  (1963)  first  approximation  (first-term  Taylor  series 
approximation)  of  the  variance  of  a  ratio  has  two  major  caveats.  The  approximation  depends  on 
large  sample  properties  to  approach  the  true  variance  of  the  ratio,  i.e.,  with  a  small  number  of 
levels  for  random  variables  the  approximation  does  not  necessarily  closely  approximate  the  true 
variance  of  the  ratio.  Work  by  Pederson  (1972)  suggests  that  for  approximating  the  variance  of 
h2  at  least  ten  parents  are  required  in  diallels  before  the  approximation  will  converge  to  the  true 
variance  even  after  including  Taylor  series  terms  past  the  first  derivative.  Pederson's  work  also 
suggests  that  the  approximation  is  progressively  worse  for  increasing  heritability  with  low 
numbers  of  parents.  Using  the  field  design  in  this  chapter  (two  locations, four  blocks  and  six -tree 
row-plots),  simulation  work  (10,000  data  sets)  has  demonstrated  that  with  a  heritability  of  0.1 
using  four  parents  in  a  half-diallel  across  two  locations  that  the  variance  of  a  ratio  approximation 
yields  a  variance  estimate  for  h2  of  0.1  while  the  convergent  value  for  the  simulation  was  0.08 
(Huber  unpublished  data).  One  should  remember  the  dependence  of  the  first  approximation  of 
the  variance  of  a  ratio  on  large  sample  properties  when  applying  the  technique  to  real  data. 

The  second  caveat  is  that  the  range  of  estimates  of  the  denominator  of  the  ratio  cannot 
pass  through  zero  (Kendall  and  Stuart  1963).  This  constraint  is  of  no  concern  for  h2;  however, 
the  structure  of  rB  and  7  denominators  allows  unbiased  minimum  variance  estimates  of  those 
denominators  to  pass  through  zero  which  means  at  one  point  in  the  distribution  of  the  estimates 
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of  the  ratios  they  are  undefined  (the  distributions  of  these  ratio  estimates  are  not  continuous). 

Simulation  has  shown  that  the  variances  of  rB  and  y  are  much  greater  than  the  approximation 

would  indicate  (Huber  unpublished  data).  The  discrepancy  in  variance  of  the  estimates  could  be 

partially  alleviated  through  using  a  variance  component  estimation  technique  which  restricts 

estimates  to  the  parameter  space  0  <  a2  <   oo.    Nevertheless,  because  of  the  two  caveats, 

approximations  of  the  variance  of  h2,  rB  and  y  estimates  should  be  viewed  only  on  a  relative  basis 

for  comparisons  among  designs  and  not  on  an  absolute  scale. 

Additionally,  the  expectation  of  a  ratio  does  not  equal  the  ratio  of  the  expectations  (Hogg 

and  Craig  1978).    If  a  value  of  genetic  ratios  is  sought  so  that  the  value  equals  the  ratio  of  the 

expectations,  then  the  appropriate  way  to  calculate  the  ratio  would  be  to  take  the  mean  of 

variance  components  or  linear  combinations  of  variance  components  across  many  experiments  and 

then  take  the  ratio.  If  the  value  sought  for  h2  is  the  expectation  of  the  ratio,  then  taking  the  mean 

of  many  h2  estimates  is  the  appropriate  approach.    Returning  to  the  results  from  simulated  data 

(10,000  data  sets)  where  the  h2  value  was  set  at  0.1,  using  the  ratio  of  the  means  of  variance 

components  rendered  a  value  of  0.1  for  h2,  the  mean  of  the  h2  estimates  returned  a  value  of  0.08, 

and  a  Taylor  series  approximation  of  the  mean  of  the  ratio  yielded  0.07  (Pederson  1972). 

Conclusions 

Results  from  this  study  should  be  interpreted  as  relative  comparisons  of  the  levels  of  the 
factors  investigated.  However,  viewing  the  optimal  design  problem  as  illustrated  in  the 
discussion  section  of  this  chapter  can  provide  insight  to  the  more  general  problem. 

There  is  no  globally  most  efficient  number  of  locations,  parents  or  mating  design  for  the 
three  ratios  estimated  even  within  the  restricted  range  of  this  study;  yet,  some  general  conclusions 
can  be  drawn.    For  estimating  h2  the  half-sib  design  is  always  optimal  or  close  to  optimal  in 
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terms  of  variance  of  estimation  and  efficiency.  In  the  estimation  of  rB  and  7,  the  circular  mating 

design  is  always  optimal  or  near  optimal  in  variance  reduction  and  efficiency.    Across  numbers 

of  parents  within  a  mating  design  only  the  half-diallel  shows  optima  for  efficiency.    The  other 

mating  designs  have  non-decreasing  efficiency  plots  over  the  level  of  number  of  parent;  so  that 

while  there  is  an  optimal  number  of  locations  for  a  level  of  genetic  control,  the  number  of  genetic 

entries  per  location  is  limited  more  by  operational  than  efficiency  constraints. 

Two  locations  is  a  near  global  optimum  over  five  locations  for  the  full-sib  mating  designs. 
Within  the  half-sib  mating  design  optimality  depends  on  the  levels  of  h2  and  rB:  1)  for  h2 
estimation  the  optimal  number  of  locations  is  inversely  related  to  the  level  of  h2,  i.e.  at  the  higher 
level  two  tests  were  optimal  and  at  the  lower  level  five  tests  were  optimal;  and  2)  for  rB 
estimation  for  the  half-sib  design,  the  optimal  number  of  locations  was  also  inversely  related  to 
the  level  of  rB. 

Means  of  estimates  from  disconnected  sets  provide  lower  variance  of  estimation  where 
the  smaller  experiments  have  higher  efficiencies.  Thus,  disconnected  sets  are  preferred  according 
to  number  of  locations  for  all  mating  designs  and  according  to  number  of  parents  for  the  half- 
diallel  mating  design. 

In  practical  consideration  of  the  optimal  mating  design  problem,  the  results  of  this  study 
indicate  that  if  h2  estimation  is  the  primary  use  of  a  progeny  test  then  the  half-sib  mating  design 
is  the  proper  choice.  Further,  the  circular  mating  design  is  an  appropriate  choice  if  the 
estimation  of  rB  is  more  important  than  h2,.  Finally,  if  a  full-sib  design  is  required  to  furnish 
information  about  dominance  variance,  the  circular  design  provides  almost  globally  better 
efficiencies  for  h2,  rB,  and  7  than  the  half-diallel. 


CHAPTER  3 

ORDINARY  LEAST  SQUARES  ESTIMATION  OF  GENERAL 

AND  SPECIFIC  COMBINING  ABILITIES  FROM 

HALF-DIALLEL  MATING  DESIGNS 


Introduction 

The  diallel  mating  system  is  an  altered  factorial  design  in  which  the  same  individuals  (or 
lines)  are  used  as  both  male  and  female  parents.  A  full  diallel  contains  all  crosses,  including 
reciprocal  crosses  and  selfs,  resulting  in  a  total  of  p2  combinations,  where  p  is  the  number  of 
parents.  Assumptions  that  reciprocal  effects,  maternal  effects,  and  paternal  effects  are  negligible 
lead  to  the  use  of  the  half-diallel  mating  system  (Griffing  1956,  method  4)  which  has  p(p-l)/2 
parental  combinations  and  is  the  mating  system  addressed  in  this  chapter. 

Half  diallels  have  been  widely  used  in  crop  and  tree  breeding  (Sprague  and  Tatum  1942, 
Gilbert  1958,  Matzinger  et  al.  1959,  Burley  et  al.  1966,  and  Squillace  1973)  and  the  widespread 
use  of  this  mating  system  continues  today  (Weir  and  Zobel  1975,  Wilcox  et  al.  1975,  Snyder  and 
Namkoong  1978,  Hallauer  and  Miranda  1981,  Singh  and  Singh  1984,  Greenwood  et  al.  1986, 
and  Weir  and  Goddard  1986). 

Most  of  the  statistical  packages  available  treat  fixed  effect  estimation  as  the  objective  of 
the  program  with  random  variables  representing  nuisance  variation.  Within  this  context  a 
common  analysis  of  half-diallel  experiments  is  conducted  by  first  treating  genetic  parameters  as 
fixed  effects  for  estimation  of  general  (GCA)  and  specific  (SCA)  combining  abilities  and 
subsequently  as  random  variables  for  variance  component  estimation  (used  for  estimating 
heritabilities,  genetic  correlations,  and  general  to  specific  combining  ability  variance  ratios  for 
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determining  breeding  strategies).  This  chapter  focuses  on  the  estimation  of  GCA's  and  SCA's 
as  fixed  effects.  The  treatment  of  GCA  and  SCA  as  fixed  effects  in  OLS  (ordinary  least  squares) 
is  an  entirely  appropriate  analysis  if  the  comparisons  are  among  parents  and  crosses  in  a 
particular  experiment.  If,  as  forest  geneticists  often  wish  to  do,  GCA  estimates  from 
disconnected  experiments  are  to  be  compared,  then  methods  such  as  checklots  must  be  used  to 
place  the  estimates  on  a  common  basis. 

Formulae  (Griffing  1956,  Falconer  1981,  Hallauer  and  Miranda  1981,  and  Becker  1975) 
for  hand  calculation  of  general  and  specific  combining  abilities  are  based  on  a  solution  to  the  OLS 
equations  for  half-diallels  created  by  sum-to-zero  restrictions,  i.e.,  the  sum  of  all  effect  estimates 
for  an  experimental  factor  equals  zero.  These  formulae  will  yield  correct  OLS  solutions  for  sum- 
to-zero  genetic  parameters  provided  the  data  have  no  missing  cells.  If  cell  (plot)  means  are  used 
as  the  basis  for  the  estimation  of  effects,  there  must  be  at  least  one  observation  per  cell  (plot) 
where  a  cell  is  a  subclassification  of  the  data  defined  by  one  level  of  every  factor  (Searle  1987). 
An  example  of  a  cell  is  the  group  of  observations  denoted  by  AB^  for  a  randomized  complete 
block  design  with  factor  A  across  blocks  (B).  If  the  above  formulae  are  applied  without 
accounting  for  missing  cells,  incorrect  and  possibly  misleading  solutions  can  result.  The  matrix 
algebra  approach  is  described  in  this  chapter  for  these  reasons:  1)  in  forest  tree  breeding 
applications  data  sets  with  missing  cells  are  extremely  common;  2)  many  statistical  packages  do 
not  allow  direct  specification  of  the  half-diallel  model;  3)  the  use  of  a  linear  model  and  matrix 
algebra  can  yield  relevant  OLS  solutions  for  any  degree  of  data  imbalance;  and  4)  viewing  the 
mechanics  of  the  OLS  approach  is  an  aid  to  understanding  the  properties  of  the  estimates. 

The  objectives  of  this  chapter  are  to  (1)  detail  the  construction  of  ordinary  least  squares 
(OLS)  analysis  of  half-diallel  data  sets  to  estimate  genetic  parameters  (GCA  and  SCA)  as  fixed 
effects,  (2)  recount  the  assumptions  and  mathematical  features  of  this  type  of  analysis,  (3) 
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facilitate  the  reader's  implementation  of  OLS  analyses  for  diallels  of  any  degree  of  imbalance  and 

suggest  a  method  for  combining  estimates  from  disconnected  experiments,  and  (4)  aid  the  reader 
in  ascertaining  what  method  is  an  appropriate  analysis  for  a  given  data  set. 

Methods 

Linear  Model 

Plot  means  are  used  as  the  unit  of  observation  for  this  analysis  with  unequal  numbers  of 
observations  per  plot.  Plot  (cell)  means  are  always  estimable  as  long  as  there  is  one  observation 
per  plot,  and  linear  combinations  of  these  means  (least  squares  means)  provide  the  most  efficient 
way  of  estimating  OLS  fixed  effects  (Yates  1934).  Throughout  this  chapter,  estimates  are 
denoted  by  lower  case  letters  while  the  parameters  are  designated  by  upper  case  letters  and 
matrices  are  in  bold  print. 

Using  plot  means  as  observations,  a  common  scalar  linear  model  for  an  analysis  of  a  half- 
diallel  mating  design  with  p(p-l)/2  crosses  planted  at  a  single  location  in  a  randomized  complete 
block  design  with  one  plot  per  block  is 

yijk  =  (i  +  B;  +  GCAj  +  GCAk  +  SCAjk  +  eijk  3-1 

where  yijk  is  the  mean  of  the  i-  block  for  the  jk-  cross; 

ft  is  an  overall  mean; 

B,  is  the  fixed  effect  of  block  i  for  i=  1  to  b; 
GCAj  is  the  fixed  general  combining  ability  effect  of  the  j-  female  parent  or 

k-  male  parent,  j  or  k  =  1,.  .  .,p  (j  +  k); 
SCA^  is  the  fixed  specific  combining  ability  effect  of  parents  j  and  k;  and 
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^  is  the  random  error  associated  with  the  observation  of  the  jk-  cross  in 

the  i-  block  where  eijk  _  (0,  o^). 

Cross  by  block  interaction  as  genotype  by  environment  interaction  is  treated  as  confounded  with 

between  plot  variation  as  for  contiguous  plots. 

The  model  in  matrix  notation  is 

y  =  X0  +  e  3-2 

where     y  is  the  vector  of  observation  vectors  (nxl  =  n  rows  and  1  column)  where  n  equals 

the  number  of  observations; 

X  is  the  design  matrix  (nxm)  whose  function  is  to  select  the  appropriate  parameters 

for  each  observation  where  m  equals  the  number  of  fixed  effect  parameters  in  the 

model; 

0  is  the  vector  (mxl)  of  fixed  effect  parameters  ordered  in  a  column;  and 

e  is  the  vector  (nxl)  of  deviations  (errors)  from  the  expectation  associated  with  each 

observation. 

Ordinary  Least  Squares  Solutions 

The  matrix  representation  of  an  OLS  fixed  effects  solution  is 

b  =  (X'XyX'y  3-3 

where  b  is  the  vector  of  estimated  fixed  effect  parameters,  i.e.,  an  estimate  of  /J,  and 

X  is  the  design  matrix  either  made  full  rank  by  reparameterization, 
or  a  generalized  inverse  of  X'X  may  be  used. 
Inherent  in  this  solution  is  the  ordinary  least  squares  assumption  that  the  variance- 
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covariance  matrix  (V)  of  the  observations  (y)  is  equal  to  la2,,,  where  I  is  an  nxn  identity  matrix. 
The  elements  of  an  identity  matrix  are  l's  on  the  main  diagonal  and  all  other  elements  are  0. 
Multiplying  I  by  o2,  places  a2e  on  the  main  diagonal.  In  the  covariance  matrix  for  the 
observations,  the  variance  of  the  observations  appears  on  the  main  diagonal  and  the  covariance 
between  observations  appears  in  the  off-diagonal  elements.  Thus,  V  =  la2,,  states  that  the 
variance  of  the  observations  is  equal  to  a2e  for  each  observation  and  there  are  no  covariances 
between  the  observations  (which  is  one  direct  result  of  considering  genetic  parameters  as  fixed 
effects). 

Sum-to-Zero  Restrictions 

The  design  matrix  presented  in  this  chapter  is  reparameterized  by  sum-to-zero  restrictions 
to  (1)  reduce  the  dimension  of  the  matrices  to  a  minimal  size,  and  (2)  yield  estimates  of  fixed 
effects  with  the  same  solution  as  common  formulae  in  the  balanced  case.  Other  restrictions  such 
as  set-to-zero  could  also  be  applied  so  the  discussion  that  follows  treats  sum-to-zero  restrictions 
as  a  specific  solution  to  the  more  general  problem  which  is  finding  an  inverse  for  X'X.  The 
subscripts  V  and  V  refer  to  the  overparameterized  model  and  the  reparameterized  model  with 
sum-to-zero  restrictions,  respectively. 

The  matrix  X0  of  Figure  3-1  is  the  design  matrix  for  an  overparameterized  linear  model 
(Milliken  and  Johnson  1984,  page  96).  Overparameterization  means  that  the  equations  are  written 
in  more  unknowns  (parameters,  in  this  case  13)  than  there  are  equations  (number  of  observations 
minus  degrees  of  freedom  for  error,  in  this  case  12  -  5  =  7)  with  which  to  estimate  the 
parameters.  Reparameterization  as  a  sum-to-zero  matrix  overcomes  this  dilemma  by  reducing 
the  number  of  parameters  through  making  some  of  the  parameters  linear  combinations  of  others. 
Sum-to-zero  restrictions  make  the  resulting  parameters  and  estimates  sum  to  zero  even  though 
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the  unrestricted  parameters  (for  example,  the  true  GCA  values  as  applied  to  a  broader  population) 

do  not  necessarily  sum-to-zero  within  a  diallel.    This  is  the  problem  of  comparability  of  GCA 

estimates  from  disconnected  experiments. 


H    B,  Bj       GCA,    GCAj   GCA3    GCA4    SCA12  SCA13  SCA14  SCA^  SCA^  SCA„ 

ym  "" 

ym 

yiH 

ym 

y[24 

yw 

y2i2 

y2i3 

y2i4 

y223 
y224 
Ly234    J 

y  =  x0  & 

Figure  3-1.  The  overparameterized  linear  model  for  a  four-parent  half-diallel  planted  on  a  single 
site  in  two  blocks  displayed  as  matrices.  The  design  matrix  (XJ  and  parameter  vector  (0O)  are 
shown  in  overparameterized  form,  l's  and  O's  denote  the  presence  or  absence  of  a  parameter  in 
the  model  for  the  observed  means  (data  vector,  y).  The  parameters  displayed  above  the  design 
matrix  label  the  appropriate  column  for  each  parameter.   Error  vector  not  exhibited. 
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Figure  3-2.  The  linear  model  for  a  four-parent  half-diallel  planted  on  a  single  site  in  two  blocks 
displayed  as  matrices.  The  design  matrix  (XJ  and  the  parameter  vector  (0,)  are  presented  in 
sum-to-zero  format.  The  parameters  displayed  above  the  design  matrix  label  the  appropriate 
column  for  each  parameter. 


To  illustrate  the  concept  of  sum-to-zero  estimates  versus  population  parameters,  we  use 
the  expectation  of  a  common  formula.    Becker  (1975)  gives  equation  3-4  (which  for  balanced 
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cases  is  equivalent  to  gj  =  ((p-l)/(p-2))(Zj  -  Z  ))  as  the  estimate  for  general  combining  ability 

for  the  j-  line  with  p  equalling  the  number  of  parents  and  Z^  equalling  the  site  mean  of  the  j  x 
k  cross.  This  equation  yields  the  same  solution  as  the  matrix  equations  with  no  missing  plots  or 
crosses  and  with  a  design  matrix  which  contains  the  sum-to-zero  restrictions.  An  evaluation  of 
this  formula  in  a  four-parent  half-diallel  planted  in  b  blocks  for  the  GCA  of  parent  1  is  obtained 
by  substituting  the  expectation  of  the  linear  model  (equation  3-1)  for  each  observation: 

gj  =  (l/(p(p-2)))(pZi.-2Z.)  34 

E{gl}  =  E{(l/(p(p-2)))(pZ,  -  2Z .)} 
E{g,}  =  3/4(GCA.)  -  1/4(GCA2  +  GCA3  +  GCAJ  +  1/4(SCA12  +  SCAI3  +  SCA14)  - 
IMCSCAjj  +  SCA^  +  SCA3J. 

The  result  of  equation  3-4  is  obviously  not  GCA,  from  the  unrestricted  model  (equation 
3-1).  Thus,  g,,  an  estimable  function  and  an  estimate  of  parameter  GCA,6  (the  estimate  of  the 
GCA  of  parent  1  given  the  sum-to-zero  restrictions),  does  not  have  the  same  meaning  as  GCA, 
in  the  unrestricted  model.  An  estimable  function  is  a  linear  combination  of  the  observations;  but 
in  order  for  an  individual  parameter  in  a  model  to  be  estimable,  one  must  devise  a  linear 
combination  of  the  observations  such  that  the  expectation  has  a  weight  of  one  on  the  parameter 
one  wishes  to  estimate  while  having  a  weight  of  zero  on  all  other  parameters.  A  solution  such 
as  this  does  not  exist  for  the  individual  parameters  in  the  overparameterized  model  (equation  3-1). 
So,  although  the  sum-to-zero  restricted  GCA  parameters  and  estimates  are  forced  to  sum-to-zero 
for  the  sample  of  parents  in  a  given  diallel,  the  unrestricted  GCA  parameters  only  sum-to-zero 
across  the  entire  population  (Falconer  1981)  and  an  evaluation  of  GCA,„  demonstrates  that  the 
estimate  contains  other  model  parameters. 

The  result  of  sum-to-zero  restrictions  is  that  the  degrees  of  freedom  for  a  factor  equals 
the  number  of  columns  (parameters)  for  that  factor  in  X,  (Figure  3-2).    Thus,  a  generalized 
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inverse  for  X,'X,  is  not  required  since  the  number  of  columns  in  the  sum-to-zero  X,  matrix  for 

each  factor  equals  the  degrees  of  freedom  for  that  factor  in  the  model  (X,  is  full  column  rank  and 

provides  a  solution  to  equation  3-3). 

Components  of  the  Matrix  Equation 

The  equational  components  of  3-2  are  now  considered  in  greater  detail. 
Data  vector  y 

Observations  (plot  means)  in  the  data  vector  are  ordered  in  the  manner  demonstrated  in 
Figure  3-1.  For  our  example  Figure  3-1  is  the  matrix  equation  of  a  four  parent  half-diallel 
mating  design  planted  in  two  randomized  complete  blocks  on  a  single  site.  There  are  six  crosses 
present  in  the  two  blocks  for  a  total  of  12  observations  in  the  data  vector,  y.  The  observations 
are  first  sorted  by  block.  Second,  within  each  block  the  observations  should  be  in  the  same 
sequence  (for  simplicity  of  presentation  only).  This  sequence  is  obtained  by  assigning  numbers 
1  through  p  to  each  of  the  p  parents  and  then  sorting  all  crosses  containing  parent  1  (whether  as 
male  or  female)  as  the  primary  index  in  descending  numerical  order  by  the  other  parent  of  the 
cross  as  the  secondary  index.  Next  all  crosses  containing  parent  2  (primary  index,  as  male  or 
female)  in  which  the  other  parent  in  the  cross  (secondary  index)  has  a  number  greater  than  2  are 
then  also  sorted  in  descending  order  by  the  secondary  index.  This  procedure  is  followed  through 
using  parent  p-1  as  the  primary  index. 
Design  matrix  and  parameter  vector,  X  and  8 

The  design  matrix  for  a  model  is  conceptually  a  listing  of  the  parameters  present  in  the 
model  for  each  observation  (Searle  1987,  page  243).  In  Figure  3-1,  y  and  80  are  exhibited  and 
the  parameters  in  80  are  displayed  at  the  tops  of  the  columns  of  X0  (a  visually  correct 
interpretation  of  the  multiplication  of  a  matrix  by  a  vector).  For  each  observation  in  y,  the  scalar 
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model  (equation  3-1)  may  be  employed  to  obtain  the  listing  of  parameters  for  that  observation 

(the  row  of  the  design  matrix  corresponding  to  the  particular  observation).  The  convention  for 
design  matrices  is  that  the  columns  for  the  factors  occur  in  the  same  order  as  the  factors  in  the 
linear  model  (equation  3-1  and  Figure  3-1).  Since  design  matrices  can  be  devised  by  first 
creating  the  columns  pertinent  to  each  factor  in  the  model  (submatrices)  and  then  horizontally 
and/or  vertically  stacking  the  submatrices,  the  discussion  of  the  reparameterized  design  matrix 
formulation  will  proceed  by  factor. 
Mean 

The  first  column  of  X,  is  for  \i  and  is  a  vector  of  l's  with  the  number  of  rows  equalling 
the  number  of  observations  (Figure  3-2).    The  linear  model  (equation  3-1)  indicates  that  all 
observations  contain  \i  and  the  deviation  of  the  observations  from  \i  is  explained  in  terms  of  the 
factors  and  interactions  in  the  model  plus  error. 
Block 

The  number  of  columns  for  block  is  equal  to  the  number  of  blocks  minus  one  (column 
2,  XJ.  Each  row  of  a  block  submatrix  consists  of  l's  and  O's  or  -l's  according  to  the  identity 
of  the  observation  for  which  the  row  is  being  formed.  The  normal  convention  is  that  the  first 
column  represents  block  1  and  the  second  column  block  2,  etc.  through  block  b-1.  Since  we 
have  used  a  sum-to-zero  solution  (1^=0),  the  effect  due  to  block  b  is  a  linear  combination  of 
the  other  b-1  effects,  i.e.,  \  =  -E-Ijbi  which  in  our  example  is  0  =  b,  +  b2  and  b2  =  -b,. 
Thus,  the  row  of  the  block  submatrix  for  an  observation  in  block  b  (the  last  block)  has  a  -1  in 
each  of  the  b-1  columns  signifying  that  the  block  b  effect  is  indeed  a  linear  combination  of  the 
other  b-1  block  effects.  Columns  2  and  3  of  X„  (Figure  3-1)  have  become  column  2  of  X, 
(Figure  3-2). 
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General  combining  ability 

This  submatrix  of  X,  is  slightly  more  complex  than  previous  factors  as  a  result  of  having 
two  levels  of  a  main  effect  present  per  observation,  i.e.,  the  deviation  of  an  observation  from  fi 
is  modeled  as  the  result  of  the  GCA's  of  both  the  male  and  female  parents  (equation  3-1).  Again 
we  have  imposed  a  restriction,  Ejgcaj  =  0.  Since  GCA  has  p-1  degrees  of  freedom,  the  submatrix 
for  GCA  should  have  p-1  columns,  i.e.,  gca,,  =  -EJ=Jgcaj.  The  GCA  submatrix  for  X,  (columns 
3  through  5  in  Figure  3-2)  is  formed  from  X„  (columns  4  through  7  in  Figure  3-1)  according  in 
the  same  manner  as  the  block  matrix:  (1)  add  minus  one  to  the  elements  in  the  other  columns 
along  each  row  containing  a  one  for  gca,,  (p  =  4  in  our  example);  and  (3)  delete  the  column  from 
X0  corresponding  to  gca,,.  The  GCA  submatrix  has  p(p-l)/2  rows  (the  number  of  crosses).  This, 
with  no  missing  cells  (plots),  equals  the  number  of  observations  per  block.  To  form  the  GCA 
factor  submatrix  for  a  site,  the  GCA  submatrix  is  vertically  concatenated  (stacked  on  itself)  b 
times.  This  completes  the  portion  of  the  X,  matrix  for  GCA. 
Specific  combining  ability 

In  order  to  facilitate  construction  of  the  SCA  submatrix,  a  horizontal  direct  product 
should  be  defined.  A  horizontal  direct  product,  as  applied  to  two  column  vectors,  is  the  element 
by  element  product  between  the  two  vectors  (SAS/IML1  User's  Guide  1985)  such  that  the 
element  in  the  i—  row  of  the  resulting  product  vector  is  the  product  of  the  elements  in  the  i-  rows 
of  the  two  initial  vectors.  The  resultant  product  vector  has  dimension  n  x  1.  A  horizontal  direct 
product  is  useful  for  the  formation  of  interaction  or  nested  factor  submatrices  where  the  initial 
matrices  represent  the  main  factors  and  the  resulting  matrix  represents  an  interaction  or  a  nested 
factor  (product  rule,  Searle  1987). 


'SAS/IML  is  the  registered  trademark  of  the  SAS  Institute  Inc.  Cary,  North  Carolina. 
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The  SCA  submatrix  can  be  formulated  from  the  horizontal  direct  products  of  the  columns 

of  the  GCA  sub-matrix  in  X,  (Figure  3-2).     The  results  from  the  GCA  columns  require 

manipulation  to  become  the  SCA  submatrix  (since  degrees  of  freedom  for  SCA  do  not  equal  those 

of  an  interaction  for  a  half-diallel  analysis),  but  the  GCA  column  products  provide  a  convenient 

starting  point.  The  column  of  the  SCA  submatrix  representing  the  cross  between  the  j—  and  the 

k-  parents  (SCAjJ  is  formed  as  the  product  between  the  GCAj  and  GCAk  columns  (Figure  3-3). 

The  GCA  columns  in  Figure  3-2  are  multiplied  in  this  order:   column  1  times  column  2  forming 

the  first  SCA  column,  column  1  times  column  3  forming  the  second  SCA  column,  and  column 

2  times  column  3  forming  the  third  SCA  column  (Figure  3-3).    With  four  parents  (six  crosses) 

there  are  three  degrees  of  freedom  for  GCA  (p-1)  and  two  degrees  of  freedom  for  SCA  (6  crosses 

-  3  for  GCA  -  1  for  the  mean).    Since  SCA  has  only  two  degrees  of  freedom,  a  sum-to-zero 

design  matrix  can  have  only  two  columns  for  SCA.   Imposing  the  restriction  that  the  sum  of  the 

SCA's  across  all  parents  equals  zero  is  equivalent  to  making  the  last  column  for  the  SCA 

submatrix  (Figure  3-3)  a  linear  combination  of  the  others  (Figure  3-2).    The  procedure  for 

deleting  the  third  column  product  is  identical  to  that  for  the  GCA  submatrix:   add  minus  one  to 

every  element  in  the  rows  of  the  remaining  SCA  columns  in  which  a  one  appears  in  the  column 

which  is  to  be  deleted  (Figure  3-2,  columns  6  and  7).  The  number  of  rows  in  the  SCA  submatrix 

equals  the  number  observations  in  a  block  and  must  be  vertically  concatenated  b  times  to  create 

the  SCA  submatrix  for  a  site. 

An  algebraic  evaluation  of  SCA  sum-to-zero  restrictions  requires  that  EjSca^  =  0  for 

each  k  and  that  E^sca^  =  0;  thus,  for  observations  in  the  i-  block  with  i  serving  to  denote  the 

row  of  the  SCA  submatrix  in  block  i,  scail4  =  -scail2  -scaiI3  and  entries  in  the  submatrix  row  for 

yil4  are  -l's.   The  estimate  for  sca^  equals  sca^  because  scai23  is  the  negative  of  the  sum  of  the 

independently  estimated  SCA's  (sca^  and  scail3)  from  the  restriction  that  the  sum  of  the  SCA's 


39 

across  all  parents  equals  zero.   Similarly,  by  sum-to-zero  definition  sca^  =  -sca^  -sca^  and  by 

substitution  sca^  =  -(-sca^  -scaj13)  -scai12  =  scai,3.  By  the  same  protocol,  it  can  be  shown  that 
sca^  =  scai,2.  The  elements  in  the  rows  of  the  SCA  submatrix  are  l's,  -l's  and  O's  in 
accordance  with  the  algebraic  evaluation.  Thus,  while  it  may  seem  that  there  should  be  6  SCA 
values  (one  for  each  cross),  only  2  can  be  independently  estimated  and  the  remaining  4  are  linear 
combinations  of  the  independently  estimated  SCA's.  Again  the  SCA  sum-to-zero  estimates  are 
not  equal  to  the  parametric  population  SCA's.  An  analogous  illustration  for  SCA  to  that  for 
GCA  would  show  that  the  estimable  function  (linear  combination  of  observations)  for  a  given 
SCA8  contains  a  variety  of  other  parameters. 
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GCA.xGCA, 

GCA2xGCA3 
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(0)(-l)=0 

Y.,4      J 
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(-1)(0)=0 
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Figure  3-3.  Intermediate  result  in  SCA  submatrix  generation  (SCA  columns  as  horizontal  direct 
products  of  GCA,,  GCA2,  and  GCA3  columns  within  a  block).  The  SCAjk  column  is  the 
horizontal  direct  product  of  the  columns  for  GCAj  and  GCAk. 


Estimation  of  Fixed  Effects 


GCA  parameters 

The  GCA  parameters  can  be  estimated  (without  mean,  block,  and  SCA  in  the  design 
matrix)  through  the  use  of  equation  3-3,  if  there  are  no  missing  cell  means  (plots)  for  any  cross 
and  no  missing  crosses.  The  design  matrix  consists  only  of  the  GCA  submatrix.  This  design 
matrix  has  {p-1}  (for  GCA's)  columns  (the  third  through  the  fifth  columns  of  XJ.  The  b  vector 
is  an  estimate  of  the  GCA  portion  of  /Js  as  in  Figure  3-2  and  the  linear  combinations  for  the 
estimation  of  gca,,  is  gca,,  =  -E^jgcaj.    Parameters  for  any  of  the  factors  can  be  estimated 
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independently  using  the  pertinent  submatrix  as  long  as  there  are  no  missing  cell  means  (plots)  and 

no  missing  crosses;  this  uses  a  property  known  as  orthogonality. 

Orthogonality  requires  that  the  dot  product  between  two  vectors  equals  zero  (Schneider 
1987,  page  168).  The  dot  product  (a  scalar)  is  the  sum  of  the  values  in  a  vector  obtained  from 
the  horizontal  direct  product  of  two  vectors.  For  two  factors  to  be  orthogonal,  the  dot  products 
of  all  the  column  vectors  making  up  the  section  of  the  design  matrix  for  one  factor  with  the 
column  vectors  making  up  the  portion  of  the  design  matrix  for  the  second  must  be  zero.  If  all 
factors  in  the  model  are  orthogonal,  then  the  X,'X,  matrix  is  block  diagonal.  A  block-diagonal 
X,'X,  matrix  is  composed  of  square  factor  submatrices  (degrees  of  freedom  x  degrees  of  freedom) 
along  the  diagonal  with  all  off-diagonal  elements  not  in  one  of  the  square  factor  submatrices 
equalling  zero.  A  property  of  block-diagonal  matrices  is  that  the  inverse  can  be  calculated  by 
inverting  each  block  separately  and  replacing  the  original  block  in  the  full  X'X  matrix  by  the 
inverted  block.  Because  the  blocks  can  be  inverted  separately  and  all  other  off-diagonal  elements 
of  the  inverse  are  zero,  the  effects  for  factors  which  are  orthogonal  to  all  other  factors  may  be 
estimated  separately,  i.e.,  there  are  no  functions  of  other  sum-to-zero  factors  in  the  sum-to-zero 
estimates. 
Mean,  block.  GCA  and  SCA  parameters 

All  parameters  are  estimated  simultaneously  by  horizontally  concatenating  the  mean, 
block,  GCA,  and  SCA  matrices  to  create  X,.  Equation  3-3  is  again  utilized  to  solve  the  system 
of  equations.  The  b  vector  for  the  four  parent  example  is  an  estimate  of  0,  of  Figure  3-2. 
Again,  one  parameter  is  estimated  for  each  column  in  the  X,  matrix  and  all  parameter  estimates 
not  present  are  linear  combinations  of  the  parameter  estimates  in  the  b  vector.  So  \  is  equal  to  - 
E^Jb;  and  gca,,  is  equal  to  -EJJgcaj.  The  linear  combinations  for  SCA  effects  can  be  obtained 
by  reading  along  the  row  of  the  SCA  submatrix  associated  with  the  observation  containing  the 
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parameter,  i.e.,  in  Figure  3-2  the  observation  y^  contains  the  effect  sca^  which  is  estimated  as 

the  linear  combination  -sca^  -sca^. 

This  completes  the  estimation  of  fixed  effect  parameters  from  a  data  set  which  is  balanced 

on  a  plot-mean  basis.    Since  field  data  sets  with  such  completeness  are  a  rarity  in  forestry 

applications,  the  next  step  is  OLS  analysis  for  various  types  of  data  imbalance.   Calculations  of 

solutions  based  on  a  complete  data  set  and  simulated  data  sets  with  common  types  of  imbalance 

are  demonstrated  in  numerical  examples. 

Numerical  Examples 

The  data  set  analyzed  in  the  numerical  examples  is  from  a  five-year-old,  six-parent  half- 
diallel  slash  pine  (Pinus  elliottii  var.  elliottii  Engelmn)  progeny  test  planted  on  a  single  site  in 
four  complete  blocks.  Each  cross  is  represented  by  a  five-tree  row  plot  within  each  block.  Total 
height  in  meters  and  diameter  at  breast  height  (dbh  in  centimeters)  are  the  traits  selected  for 
analysis.  The  data  set  is  presented  in  Table  3-1  so  that  the  reader  may  reconstruct  the  analysis 
and  compare  answers  with  the  examples.  The  numbers  1  through  6  were  arbitrarily  assigned  to 
the  parents  for  analysis.  Because  of  unequal  survival  within  plots,  plot  means  are  used  as  the  unit 
of  observation. 

Balanced  Data  (Plot-mean  Basis) 

The  sum-to-zero  design  matrix  for  the  balanced  data  set  has  (4  blocks)x(15  crosses)  =  60 
rows  (which  equals  the  number  of  observations  in  y)  and  has  the  following  columns:  one  column 
for  n,  three  columns  for  blocks  (b-1),  five  columns  for  GCA  (p-1),  and  nine  columns  for  SCA 
(15  crosses  -  5  -  1)  for  a  total  of  18  columns.  With  sixty  plot  means  (degrees  of  freedom)  and 
18  degrees  of  freedom  in  the  model,  subtracting  18  from  60  yields  42  degrees  of  freedom  for 
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error  which  matches  the  degrees  of  freedom  for  cross  by  block  interaction,  thus  verifying  that 

degrees  of  freedom  concur  with  the  number  of  columns  in  the  sum-to-zero  design  matrix. 

To  illustrate  the  principle  of  orthogonality  in  the  balanced  case,  the  X'X  and  (X'X)'1  matrices 
may  be  printed  to  show  that  they  are  block  diagonal.  In  further  illustration,  the  effects  within 
a  factor  may  also  be  estimated  without  any  other  factors  in  the  design  matrix  and  compared  to 
the  estimates  from  the  full  design  matrix. 

The  vectors  of  parameter  estimates  for  height  and  dbh  (Table  3-2)  were  calculated  from  the 
same  X,  matrix  because  height  and  dbh  measurements  were  taken  on  the  same  trees.  In  other 
words,  if  a  height  measurement  was  taken  on  a  tree,  a  dbh  measurement  was  also  taken,  so  the 
design  matrices  are  equivalent. 

Missing  Plot 

To  illustrate  the  problem  of  a  missing  plot,  the  cross,  parent  two  by  parent  three,  was 
arbitrarily  deleted  in  block  one  (as  if  observation  y123  were  missing).  This  deletion  prompts 
adjustments  to  the  factor  matrices  in  order  to  analyze  the  new  data  set.  The  new  vector  of 
observations  (y)  now  has  59  rows.  This  necessitates  deletion  of  the  row  of  the  design  matrix  (XJ 
in  block  1  which  would  have  been  associated  with  cross  2x3.  This  is  the  only  matrix  alteration 
required  for  the  analysis.  Thus,  the  resultant  X,  matrix  has  60  -  1  =59  rows  and  18  columns. 
With  59  means  in  y  and  18  columns  in  X,,  the  degrees  of  freedom  for  error  is  41. 

Comparisons  between  results  of  the  analyses  (Table  3-2)  of  the  full  data  set  and  the  data 
set  missing  observation  y123  reveal  that  for  this  case  the  estimates  of  parameters  have  been 
relatively  unaffected  by  the  imbalance  (magnitudes  of  GCA's  changed  only  slightly  and  rankings 
by  GCA  were  unaffected). 
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Table  3-1.  Data  set  for  numerical  examples.  Five-year-old  slash  pine  progeny  test  with  a  6- 
parent  half-diallel  mating  design  present  on  a  single  site  with  four  randomized  complete  blocks 
and  a  five-tree  row  plot  per  cross  per  block. 


Within  Plot 

Trees 

Mean 

Mean 

Variance       Variance 

per 

Block 

Female 

Male 

Height 

DBH 

Height            DBH 

Plot 

2 
2 
2 
2 
2 
2 
2 
2 
2 
2 
2 
2 
2 
2 
2 
3 
3 
3 
3 
3 
3 
3 
3 
3 
3 
3 
3 
3 
3 
3 
4 
4 
4 
4 
4 
4 
4 
4 
4 
4 
4 
4 
4 
4 
4 


1 
1 
1 
1 
2 
2 
3 
3 
3 
4 
4 
4 
4 
4 
5 
1 
1 
1 
1 
2 
2 
3 
3 
3 
4 
4 
4 
4 
4 
5 
1 
1 
1 
1 
2 
2 
3 
3 
3 
4 
4 
4 
4 
4 
5 
1 
1 
1 
1 

2 
2 
3 
3 
3 
4 
4 
4 
4 
4 
5 


2 
3 
5 
6 
5 
6 
2 
5 
6 
1 

2 
3 
5 
6 
6 
2 
3 
5 
6 
5 
6 
2 
5 
6 
1 

2 
3 
5 
6 
6 
2 
3 
5 
6 
5 
6 
2 
5 
6 
1 
2 
3 
5 
6 
6 
2 
3 
5 
6 
5 
6 
2 
5 
6 
1 
2 

3 

5 
6 
6 


Meiers 

Centimeters 

m 

cm2 

2.6899 

3.810 

0.9800 

3.484 

1.9080 

2.134 

1.4277 

3.893 

3.1242 

4.445 

0.4487 

1.656 

2.4933 

3.200 

0.8488 

5.664 

1.4783 

1.588 

0.6556 

2.167 

2.7026 

3.471 

0.1136 

0.344 

3.0480 

4.699 

0.2341 

0.968 

3.4991 

5.131 

0.0945 

0.271 

2.4003 

2.794 

0.5149 

1.548 

3.3955 

4.928 

0. 1489 

0.761 

3.4290 

5.144 

0.7943 

3.285 

2.5298 

2.984 

0.9557 

4.188 

2.4155 

3.175 

0.5936 

2.946 

3.2004 

4.521 

1.7034 

7.594 

2.2403 

2.794 

1.0433 

6.280 

3.5662 

5.080 

0.9560 

2.903 

2.6335 

3.353 

0.7695 

3.497 

3.6942 

5.893 

0.0573 

0.432 

3.4808 

4.928 

0.9222 

2.890 

3.4260 

4.877 

0.7017 

2.432 

2.4282 

3.302 

0.0616 

0.452 

3.0480 

4.064 

0.0192 

0.301 

2.8895 

4.013 

0.1957 

0.690 

1.9406 

1.863 

0.0560 

0.408 

3.0114 

3.962 

1.9753 

6.342 

3.6454 

5.283 

0.1731 

0.787 

2.9566 

3.861 

0.0506 

0.174 

2.8118 

4.382 

1.1336 

5.435 

3.2674 

4.318 

1.1211 

4.354 

3.7917 

5.893 

0.0848 

0.497 

2.2961 

2.625 

0.3914 

1.699 

2.8956 

4.128 

1.2926 

4.532 

2.5359 

3.607 

0.8284 

4.303 

2.9032 

3.937 

0.8252 

4.064 

2.7737 

4.064 

0.9829 

3.226 

1.2040 

0.635 

0.4464 

0.806 

2.9870 

4.191 

0.9049 

2.989 

2.8407 

3.962 

0.7309 

3.632 

1.3564 

0.000 

0. 1677 

0.000 

2.6746 

3.620 

0.8463 

2.984 

2.7066 

3.353 

0.5590 

1.787 

3.4198 

4.623 

0.3509 

0.690 

3.3299 

4.953 

0.4102 

1.226 

3.4564 

4.978 

0.8369 

3.503 

3.2614 

4.826 

1.8974 

2.476 

1.0160 

3.629 

1.3005 

0.508 

0.2019 

0.774 

2.0726 

2.540 

1.2235 

5.097 

1.8821 

1.778 

0.4728 

3.312 

1.  64 

1.334 

0.5354 

2.382 

1.5392 

0.635 

0.0376 

0.806 

1.8898 

2.032 

0.7364 

1.892 

2.5146 

3.620 

0.0876 

0.446 

1.8389 

2.201 

0.0941 

0.280 

2.3348 

2.591 

0.3816 

2.722 

1.7272 

1.693 

2.1640 

8.602 

1.6581 

1.524 

0.0537 

0.903 

2.1184 

2.286 

0.3137 

2.366 

1 .5545 

1.422 

0.4803 

1.019 

1.4122 

1.693 

0.0338 

0.150 

4 
5 
4 
5 
4 
3 
4 
5 
4 
5 
4 
4 
4 
5 
4 
5 
5 
5 
5 
5 
3 
4 
5 
3 
5 
5 
5 
4 
5 
5 
3 
4 
5 
4 
2 
2 
4 
5 
2 
4 
5 
5 
4 
5 
1 

4 
3 
3 
4 
4 
2 
4 
4 
3 
5 
3 
5 
4 
5 
3 
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Table  3-2.  Numerical  results  for  examples  of  data  imbalance  using  the  OLS  techniques  presented 
in  the  text. 


Five 

Balanced* 

Missing 

Plot" 

Missing 

Cross' 

Missing 

Crosses'* 

Estimate 

of* 

Height 

DBH 

Height 

DBH 

Height 

DBH 

Height 

DBH 

M 

2.5830 

3.362 

2.5787 

3.346 

2.5386 

3.260 

2.4980 

3.149 

B, 

0. 1203 

0.292 

0.1074 

0.245 

0.1074 

0.245 

0.1393 

0.309 

B, 

0.5230 

0.976 

0.5274 

0.992 

0.5386 

1.023 

0.6041 

1.140 

Bj 

0.1264 

0.205 

0.1308 

0.220 

0.1180 

0.187 

0.0689 

0.087 

GCA, 

0.0706 

0.144 

0.0760 

0.163 

0. 1260 

0.270 

0.1361 

0.232 

GCA2 

-.1077 

-.180 

-.1186 

-.220 

-.2186 

-.434 

-.2371 

-.493 

GCA3 

-.1316 

-.347 

-.1426 

-.386 

-.2426 

-.601 

-.3972 

-.952 

GCA, 

0.2489 

0.398 

0.2544 

0.417 

0.3044 

0.524 

0.4241 

0.804 

GCAS 

0. 1265 

0.489 

0.1320 

0.509 

0.1820 

0.616 

0.1746 

0.646 

SCA12 

0.0665 

0.172 

0.0763 

0.208 

0.1663 

0.400 

SCA.3 

-.3374 

-.628 

-.3277 

-.592 

-.2377 

-.400 

SCA,„ 

-.0484 

-.128 

-.0550 

-.152 

-.1150 

-.280 

-.2041 

-.410 

SCA,5 

0.0766 

0.126 

0.0700 

0.102 

0.0100 

-.026 

0.0480 

0.094 

SCAu 

0.3995 

0.912 

0.3600 

0.771 

SCA24 

0.1528 

0.289 

0.1627 

0.324 

0.2527 

0.517 

0.1920 

0.408 

SCA^ 

-.3185 

-.706 

-.3084 

-.670 

-.2187 

-.478 

SCA„ 

-.0592 

0.164 

-.0493 

0.129 

0.0406 

0.064 

0.1163 

0.246 

SCA35 

0.3580 

0.677 

0.3679 

0.712 

0.4793 

0.905 

"where  (numerical  examples  are  for  height) 
b4=  -Efo  =  -.7697; 
gcae  =  -Efecaj  =  -.2067; 

scap6  =  -Escajk  for  j  or  k  =  p  and  p=  1,2,3  then  sca16  =  .2428, 
sca^  =  -.3002,  and  sca^  =  -.3608;  sca45  =  -E?scae  =  -.2898, 
e  =  independently  estimated  sea's  1,  ...  ,9; 


sca^  =  sca12  +  sca13  +  sea 


+  sca^,  +  sca^  +  sca35  =  .2446; 


and  sca^  =  sca12  +  sca13  +  sca14  +  sca^  +  sca^  +  sca-^,  =  .1737. 

bwhere    the  linear  combinations  for  parameter  estimates  are  identical 
to  the  balanced  example. 


cwhere    sca^  =  -Escajk  for  j  or  k  =  p  and  p=  1  to  3;  sca45  =  -EiScae 
e  =  independently  estimated  SCA's  1,.  .  .,8; 
sca^  =  sca12  +  sca13  +  sca15  +  sca^  +  sca35;  and 


sca^  =  sca12  +  sca,3  + 


sca14+ 


sca^  +  sca^. 


-sca15,  sca^  —  -sca^,  sca^  —  sca^, 
+  sca^  +  sca^,  and 


dwhere    sca16  =  -sca14 

sca^  =  sca15,  sca^  =  sca14  +  sca^  +  sca^,  and 

sca^  =  the  negative  of  the  sum  of  the  four  independently 

estimated  sea's. 


'where    for  all  cases  linear  combinations  for  block  and  gca  are  the  same  as  in  the  balanced  case. 
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Missing  Cross 

Another  common  form  of  imbalance  in  diallel  data  sets,  the  missing  cross,  is  examined 
through  arbitrary  deletion  of  the  2  x  3  cross  from  all  blocks,  i.e.,  y123,  y^a.  y323>  Y423  we  missing 
in  the  data  vector.  This  type  of  imbalance  is  representative  of  a  particular  cross  that  could  not 
be  made  and  is  therefore  missing  from  all  blocks.  The  matrix  manipulations  required  for  this 
analysis  are  again  presented  by  factor.  For  appropriate  SCA  restrictions,  the  data  vector  and 
design  matrix  should  be  ordered  so  that  the  p^  parent  has  no  missing  crosses.  Since  the  labeling 
of  a  parent  as  parent  p  is  entirely  subjective,  any  parent  with  all  crosses  may  be  designated  as 
parent  p.  The  previous  labelling  directions  are  necessary  since  we  generate  the  SCA  submatrix 
as  horizontal  direct  products  of  the  columns  of  the  GCA  submatrix;  and  to  account  for  missing 
crosses,  the  horizontal  direct  product  for  each  particular  missing  parental  combinations  are  not 
calculated  which  sets  the  missing  SCA's  to  zero.  If  there  is  a  cross  missing  from  those  of  the 
p—  parent,  we  cannot  account  for  the  missing  cross  with  this  technique  (Searle  1987,  page  479). 

For  the  mean,  block,  and  GCA  submatrices,  the  adjustment  for  the  missing  cross  dictates 
deleting  the  rows  in  the  submatrices  which  would  have  corresponded  to  the  y^  observations.  The 
SCA  submatrix  must  be  reformed  since  a  degree  of  freedom  for  SCA  and  hence  a  column  of  the 
submatrix  has  been  lost.  The  SCA  submatrix  is  reinstituted  from  the  GCA  horizontal  direct 
products  (remembering  that  one  cross,  2x3,  no  longer  exists  and  therefore  that  product  GCA2  x 
GCA3  is  inappropriate).  Dropping  the  column  for  SCA^,  is  equivalent  to  setting  SCA^,  to  zero 
(Searle  1987)  so  that  the  remaining  SCA's  will  sum-to-zero.  After  that,  the  reformation  is 
according  to  the  established  pattern.  With  one  missing  cross  there  are  now  56  observations  and 
hence  56  degrees  of  freedom  available.  The  columns  of  the  X,  matrix  are  now:  one  for  the 
mean,  three  for  block,  five  for  GCA,  and  eight  for  SCA  for  a  total  of  17  columns.     The 
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remaining  degrees  of  freedom  for  error  is  39,  matching  the  correct  degrees  of  freedom  ((14- 

l)x(4-l)  =  39). 

For  the  missing  cross  example  \i  is  no  longer  equivalent  to  the  mean  of  the  plot  means 
since  /x  =  2.5386  and  EjikVyiJ/N  =  2.5715  where  N  =  56  (number  of  plot  means).  This  is  the 
result  of  GCA  effects  which  are  no  longer  orthogonal  to  the  mean.  Check  the  X,'X,  matrix  or 
try  estimating  factors  separately  and  compare  to  the  estimates  when  all  factors  are  included  in  X,. 

If  formulae  for  balanced  data  (Becker  1975,  Falconer  1981,  and  Hallauer  and  Miranda 
1981)  are  applied  to  unbalanced  data  (plot-mean  basis)  estimates  of  parameters  are  no  longer 
appropriate  because  factors  in  the  model  are  no  longer  independent  (orthogonal).  Applying 
Becker's  formula  which  uses  totals  of  cross  means  for  a  site  (y  jk)  to  the  missing  cross  example 
yields:  gca,  =  .2992,  gca2  =  -.5649,  gca3  =  -.5888,  gca4  =  .4665,  gca5  =  .3552,  and  gcae  = 
.0219.  These  answers  are  very  different  in  magnitude  from  those  in  Table  3-2  for  this  example 
and  gca,,  also  has  a  different  sign.  Employing  these  formulae  in  the  analysis  of  unbalanced  data 
is  analogous  to  matrix  estimation  of  GCA's  without  the  other  factors  in  the  model  which  is 
inappropriate. 

Several  Missing  Crosses 

The  concluding  example  (Table  3-2)  is  a  drastically  unbalanced  data  set  resulting  from 
the  arbitrary  deletion  of  five  crosses  (1x2,  1  x  3,  2  x  3,  3  x  5,  and  4  x  5).  The  matrix 
manipulation  for  this  example  is  an  extension  of  the  previous  one  cross  deletion  example.  Rows 
corresponding  to  yU2,  yil3,  y^,  ya5,  and  yi45  are  deleted  from  the  mean,  block  and  GCA 
submatrices  for  all  blocks.  The  SCA  matrix  (now  4  columns  =  10  crosses  -5-1  =4  degrees 
of  freedom)  is  again  reformed  with  only  the  relevant  products  of  the  GCA  columns.  Counting 
degrees  of  freedom  (columns  of  the  sum-to-zero  design  matrix),  the  mean  has  one,  block  has 
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three,  GCA  has  five,  and  SCA  has  four  degrees  of  freedom  for  a  total  of  13.  Error  has  (4- 1 )( 1 0- 

1)  =  27  degrees  of  freedom.   Totaling  degrees  of  freedom  for  modeled  effects  and  error  yields 

40  which  equals  the  number  of  plot  means. 

In  increasingly  unbalanced  cases  (Table  3-2),  the  spread  among  the  GCA  estimates  tends 

to  increase  with  increasing  imbalance  (loss  of  information).    This  is  a  general  feature  of  OLS 

analyses  and  the  basis  for  the  feature  is  that  the  spread  among  the  GCA  estimates  is  due  to  both 

the  innate  spread  due  to  additive  genetics  effects  as  well  as  the  error  in  estimation  of  the  GCA's. 

When  there  is  less  information,  GCA  estimates  tend  to  be  more  widely  spread  due  to  the  increase 

in  the  error  variance  associated  with  their  estimation.    This  feature  has  been  noted  (White  and 

Hodge  1989,  page  54)  as  the  tendency  to  pick  as  parental  winners  individuals  in  a  breeding 

program  which  are  the  most  poorly  tested. 

Discussion 

After  developing  the  OLS  analysis  and  describing  the  inherent  assumptions  of  the 
analysis,  there  are  four  important  factors  to  consider  in  the  interpretation  of  sum-to-zero  OLS 
solutions:  (1)  the  lack  of  uniqueness  of  the  parameter  estimates;  (2)  the  weights  given  to  plot 
means  (yijk)  and  in  turn  site  means  (y  jk)  for  crosses  in  data  sets  with  missing  crosses  in  parameter 
estimation;  (3)  the  arbitrary  nature  of  using  a  diallel  mean  (perforce  a  narrow  genetic  base)  as 
the  mean  about  which  the  GCA's  sum-to-zero;  and  (4)  the  assumption  that  the  covariance  matrix 
for  the  observations  (V)  is  Ia2e. 

Uniqueness  of  Estimates 

Sum-to-zero  restrictions  furnish  what  would  appear  to  be  unique  estimates  of  the 
individual  parameters,  e.g.  GCAj,  when,  in  fact,  these  individual  parameters  are  not  estimable 
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(Graybill   1976,  Freund  and  Littell  1981,  and  Milliken  and  Johnson  1984).     The  lack  of 

estimability  is  again  analogous  to  attempting  to  solve  a  set  of  equations  in  n  unknowns  with  t 

equations  where  n  is  greater  than  t.   Therefore,  an  infinite  number  of  solutions  exist  for  0. 

There  are  quantities  in  this  system  of  equations  that  are  unique  (estimable),  i.e.,  the 

estimate  is  invariant  regardless  of  the  restriction  (sum-to-zero  or  set-to-zero)  or  generalized 

inverse  (no  restrictions)  used  (Milliken  and  Johnson  1984)  and  the  estimable  functions  include 

sum-to-zero  GCA  and  SCA  estimates  since  they  are  linear  combinations  of  the  observations;  but, 

these  estimable  quantities  do  not  estimate  the  individual  parametric  GCA's  and  SCA's  of  the 

overparameterized  model  (equation  3-4)  since  there  is  no  unique  solution  for  those  parameters. 

Weighting  of  Plot  Means  and  Cross  Means  in  Estimating  Parameters 

With  at  least  one  measurement  tree  in  each  plot  and  with  plot  means  as  the  unit  of 
observation,  use  of  the  matrix  approach  produces  the  same  results  as  the  basic  formulae.  The 
weight  placed  on  each  plot  mean  in  the  estimation  of  a  parameter  can  be  determined  by 
calculating  (X/XJ'X,'  which  can  be  viewed  as  a  matrix  of  weights  W  so  that  equation  3-3  can 
be  written  as  b  =  Wy.  The  matrix  W  has  these  dimensions:  the  number  of  rows  equals  the 
number  of  parameters  in  0,  and  the  number  of  columns  equals  the  number  of  plot  means  in  y. 
The  i-  row  of  the  W  contains  the  weights  applied  to  y  to  estimate  the  i-  parameter  in  b  (h).  In 
the  discussion  which  follows  gca!  is  utilized  as  b;. 

If  there  are  no  missing  plots,  the  cross  mean  in  every  block  (yijk)  has  the  same  weighting 
and  weights  can  be  combined  across  blocks  to  yield  the  weight  on  the  overall  cross  mean  (y  jk). 
It  can  be  shown  that  for  the  balanced  numerical  example  gca,  is  calculated  by  weighting  the 
overall  cross  means  containing  parent  1  by  1/6  and  weighting  all  overall  cross  means  not 
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Figure  3-4.  Weights  on  overall  cross  means  (yjk)  for  the  three  numerical  examples  for 
estimation  of  GCA^  The  weights  for  the  balanced  example  (above  the  diagonal)  are  presented 
in  both  fractional  and  decimal  form.  The  weights  for  the  one-cross  missing  and  the  five-crosses 
missing  are  presented  as  the  upper  number  and  lower  number,  respectively,  in  cells  below  the 
diagonal.  The  marginal  weights  on  GCA  parameters  (right  margin)  do  not  change  although  cells 
are  missing. 
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containing  parent  1  by  -1/12.    Figure  3-4  (above  the  diagonal)  demonstrates  the  weightings  on 

the  overall  cross  means  for  the  balanced  numerical  example  as  well  as  the  marginal  weighting  on 

the  GCA  parameters.   These  marginal  weightings  are  obtained  by  summing  along  a  row  and/or 

column  as  one  would  to  obtain  the  marginal  totals  for  a  parent  (Becker  1975).    One  feature  of 

sum-to-zero  solutions  is  that  these  marginal  weightings  will  be  maintained  no  matter  the 

imbalance  due  to  missing  crosses,  as  will  be  seen  by  considering  the  numerical  examples  for  a 

missing  cross  (Figure  3-4  below  the  diagonal,  upper  number)  and  five  missing  crosses  (Figure 

3-4  below  the  diagonal,  lower  number).  The  marginal  weights  have  remained  the  same  as  in  the 

balanced  case  while  the  weights  on  the  cross  means  differ  among  the  crosses  containing  parent 

1  and  also  among  the  crosses  not  containing  parent  1.    In  the  five  missing  crosses  example, 

crosses  y.M  and  y-26  even  receive  a  positive  weighting  where  in  the  prior  examples  they  had 

negative  weighting. 

The  expected  value  in  all  three  examples  is  GCAls  (for  sum-to-zero)  despite  the 

apparently  nonsensical  weightings  to  cross  means  with  missing  crosses;  however,  the  evaluation 

of  the  estimates  in  terms  of  the  original  model  changes  with  each  new  combination  of  missing 

cells,  i.e.,  yM  and  yM  have  a  positive  weight  in  the  five  missing  crosses  example  in  GCA, 

estimation.  Whether  this  type  of  estimation  is  desirable  with  missing  cell  (cross)  means  has  been 

the  subject  of  some  discussion  (Speed,  Hocking  and  Hackney  1978,  Freund  1980,  and  Milliken 

and  Johnson  1984).  The  data  analyst  should  be  aware  of  the  manner  in  which  sum-to-zero  treats 

the  data  with  missing  cell  means  and  decide  whether  that  particular  linear  combination  of  cross 

means  estimating  the  parameter  is  one  of  interest,  realizing  that  the  meaning  of  the  estimates  in 

terms  of  the  original  model  is  changing. 
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Diallel  Mean 

The  use  of  the  mean  for  a  half-diallel  as  the  mean  around  which  GCA's  sum-to-zero  is 
not  satisfactory  in  that  the  diallel  mean  is  the  mean  of  a  rather  narrow  genetically  based 
population,  and  in  particular  that  the  comparisons  of  interest  are  not  usually  confined  to  the 
specific  parents  in  a  specific  diallel  on  a  particular  site.  A  checklot  can  be  employed  to  represent 
a  base  population  against  which  comparison  of  half-  or  full-sib  families  can  be  made  to  provide 
for  comparison  of  GCA  estimates  from  other  tests  (van  Buijtenen  and  Bridgwater  1986). 

Mathematically,  when  effects  are  forced  to  sum-to-zero  around  their  own  mean,  the 
absolute  value  of  the  GCA's  is  reflective  of  their  value  relative  to  the  mean  of  the  group.  Even 
if  the  parents  involved  in  the  particular  diallel  were  all  far  superior  to  the  population  mean  for 
GCA,  GCA's  calculated  on  an  OLS  basis  would  show  that  some  of  these  GCA's  were  negative. 
If  the  GCA's  of  the  diallel  parents  were  in  fact  all  below  the  population  mean,  the  opposite  and 
equally  undesirable  result  ensues.  For  disconnected  diallels  together  on  a  single  site,  an  OLS 
analysis  would  yield  GCA  estimates  that  sum-to-zero  within  each  diallel  since  parents  are  nested 
within  diallels.  Unless  the  comparisons  of  interest  are  only  in  the  combination  of  the  parents  in 
a  specific  diallel  on  a  specific  site,  the  checklot  alternative  is  desirable. 

A  method  for  obtaining  the  desired  goal  of  comparable  GCA's  from  disconnected 
experiments,  disregarding  the  problem  of  heteroscedasticity,  is  to  form  a  function  from  the  data 
which  yields  GCA  estimates  properly  located  on  the  number  scale.  Such  a  function  can  be 
formed  (using  GCAj  as  an  example)  from  gcals,  the  diallel  mean,  and  the  checklot  mean. 

From  expectations  of  the  scalar  linear  model  (equation  3-1), 
GCAlB  =  ((p-l)/p)GCA,  -  (l/p)EP=2GCAj  +  (l/p)E£=2SCAlk  -  3-5 

(2/(p(p-2)))^:2^=3SCAjk; 
E{diallel  mean}  =  n  +  (E*_,Bj)/b  +  (2/p)Ej_,GCAj  +  (2/(p(p-l)))EP:m=2SCAjk;  and 
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E{checklot  mean}  =  \i  +  (E^,Bj)/b  +  t; 
where  j  for  GCA  is  j  or  k  and  t  represents  the  fixed  genetic  parameter  of  the  checklot.  The 
function  used  to  properly  locate  GCAlrd  (the  subscript  rel  denotes  the  relocated  GCA,,)  is  gcalrel 
=  gcal8  +  (l/2)(diallel  mean  -  checklot  mean).  The  expectation  of  gcalrel  with  negligible  SCA 
is  GCAlre,  =  GCA,  -  t/2;  and  since  breeding  value  equals  twice  GCA,  BVlrei  =  BV,  -  r.  If  SCA 
is  non-negligible  then  the  expectation  is 

GCAlrc,  =  GCA,  +  (l/(p-l))EUSCAlk  -  (l/flp-lto^^Zg^CAjk  -  t/2.  3-6 

In  either  case  the  function  provides  a  reasonable  manner  by  which  GCA  estimates  from 
disconnected  diallels  are  centered  at  the  same  location  on  a  number  scale  and  are  then 
comparable. 

Variance  and  Covariance  of  Plot  Means 

The  variances  of  plot  means  with  unequal  numbers  of  trees  per  plot  are  by  definition 
unequal,  i.e.,  Var(yijk)  =  a2  +  oVnijk  where  <r?  is  plot  variance,  a2^  is  the  within  plot  variance 
and  nijk  is  the  number  of  observations  per  plot.  Also,  if  blocks  were  considered  random,  there 
would  be  an  additional  source  of  variance  for  plot  means  due  to  blocks  (as  well  as  a  covariance 
between  plot  means  in  the  same  block)  and  this  could  be  incorporated  into  the  V  matrix  with 
Var(yijk)  =  a2,,  +  o2,,  +  aVn^.  Since  the  variances  of  the  means  in  the  observation  vector  are 
not  equal  and  there  is  a  covariance  between  the  means  if  blocks  are  being  considered  random, 
best  linear  unbiased  estimates  (BLUE)  would  be  secured  by  weighting  each  mean  by  it's  true 
associated  variance  (Searle  1987,  page  316).  This  is  the  generalized  least  squares  (GLS) 
approach  as 

b  =  (X.'V-'XJ-'X.'V'y  3-7 
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The  GLS  approach  relaxes  the  OLS  assumptions  of  equal  variance  of  and  no  covariance  between 

the  observations  (plot  means)  while  still  treating  genetic  parameters  as  fixed  effects.  The  entries 
along  the  diagonal  of  the  V  matrix  are  the  variances  of  the  plot  means  (Var(yijk))  in  the  same 
order  as  means  in  the  data  vector.  The  off-diagonal  elements  of  V  would  be  either  0  or  o2,,  (the 
variance  due  to  the  random  variable  block)  for  elements  corresponding  to  observations  in  the 
same  block.  BLUE  requires  exact  knowledge  of  V;  if  estimates  of  a2p,  a2,,,  and  a2^,  are  utilized 
in  the  V  matrix,  estimable  functions  of  /3  approximate  BLUE. 

The  OLS  assumption  that  SCA  and  GCA  are  fixed  effects  can  also  be  relaxed  to  allow 
for  covariances  due  to  genetic  relatedness.  In  particular,  the  information  that  means  are  from  the 
same  half-  or  full-sib  family  could  be  included  in  the  V  matrix.  Relaxation  of  the  zero  covariance 
assumption  implies  that  GCA  and  SCA  are  random  variables.  If  GCA  and  SCA  are  treated  as 
random  variables,  then  the  application  of  best  linear  prediction  (BLP)  or  best  linear  unbiased 
prediction  (BLUP)  to  the  problem  would  be  more  appropriate  (White  and  Hodge  1989,  page  64). 
The  treatment  of  the  genetic  parameters  as  random  variables  is  consistent  with  that  used  in 
estimating  genetic  correlations  and  heritabilities.  The  V  matrix  of  such  an  application  would 
include,  in  addition  to  the  features  of  the  GLS  V  matrix,  the  covariance  between  full-sib  or  half- 
sib  families  added  to  the  off-diagonal  elements  in  V,  i.e.,  if  the  first  and  second  plot  means  in 
the  data  vector  had  a  covariance  due  to  relationship,  then  that  covariance  is  inserted  twice  in  the 
V  matrix.  The  covariance  would  appear  as  the  second  element  in  the  first  row  and  the  first 
element  in  the  second  row  of  V  (V  is  a  symmetric  matrix).  Also  the  diagonal  elements  of  V 
would  increase  by  2crgta  (the  variance  due  to  treating  GCA  as  a  random  variable)  +  <f-m  (the 
variance  due  to  treating  SCA  as  a  random  variable). 
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Comparison  of  Prediction  and  Estimation  Methodologies 

Which  methodology  (OLS,  GLS,  BLP,  or  BLUP)  to  apply  to  individual  data  bases  is 
somewhat  a  subjective  decision.  The  decision  can  be  based  both  on  the  computational  or 
conceptual  complexity  of  the  method  and  the  magnitude  of  the  data  base  with  which  the  analyst 
is  working.  To  aid  in  this  decision,  this  discussion  highlights  the  differences  in  the  inherent 
properties  and  assumptions  of  the  techniques. 

For  all  practical  purposes  the  answers  from  the  four  techniques  will  never  be  equal; 
however,  there  are  two  caveats.  First,  OLS  estimates  equal  GLS  estimates  if  all  the  cell  means 
are  known  with  the  same  precision  (variance),  (Searle  1987,  page  490).  Otherwise,  GLS 
discounts  the  means  that  are  known  with  less  precision  in  the  calculations  and  different  estimates 
result.  The  second  caveat  is  if  the  amount  of  data  is  infinite,  i.e.,  all  cross  means  are  known 
without  error,  then  all  four  techniques  are  equivalent  (White  and  Hodge  1989,  pages  104-106). 
In  all  other  cases  BLP  and  BLUP  shrink  predictions  toward  the  location  parameter(s)  and  produce 
predictions  which  are  different  from  OLS  or  GLS  estimates  even  with  balanced  data.  During 
calculations  GLS,  BLP,  and  BLUP  place  less  weight  on  observations  known  with  less  precision, 
which  is  intuitively  pleasing. 

With  OLS  and  GLS  forest  geneticists  treat  GCA's  and  SCA's  as  fixed  effects  for 
estimation  and  then  as  random  variables  for  genetic  correlations  and  heritabilities.  BLP  and 
BLUP  provide  a  consistent  treatment  of  GCA's  and  SCA's  as  random  variables  while  differing 
in  their  assumptions  about  location  parameters  (fixed  effects).  In  BLP  fixed  effects  are  assumed 
known  without  error  (although  they  are  usually  estimated  from  the  data)  while  with  BLUP  fixed 
effects  are  estimated  using  GLS.  BLP  and  BLUP  techniques  also  contain  the  assumption  that  the 
covariance  matrix  of  the  observations  is  known  without  error  (most  often  variances  must  be 
estimated).   In  many  BLUP  applications  (Henderson  1974),  mixed  model  equations  are  utilized 
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iteratively  to  estimate  fixed  effects  and  to  predict  random  variables  from  a  data  set.    A  BLUP 

treatment  of  fixed  effects  allows  any  connectedness  between  experiments  to  be  utilized  in  the 

estimation  of  the  fixed  effects.    This  provides  an  intuitive  advantage  of  BLUP  over  BLP  in 

experimentation  where  connectedness  among  genetic  experiments  is  available  or  where  the  data 

are  so  unbalanced  that  treating  the  fixed  effects  as  known  is  less  desirable  than  a  GLS  estimate 

of  the  fixed  effects. 

An  ordering  of  computational  complexity  and  conceptual  complexity  from  least  to  most 
complex  of  the  four  methods  is  OLS,  GLS,  BLP  and  BLUP.  The  latter  three  methods  require 
the  estimation  of  the  covariance  matrix  of  the  observations  either  separately  (a  priori)  or 
iteratively  with  the  fixed  effects.  Precise  estimation  of  the  covariance  matrix  for  observations 
requires  a  great  number  of  observations  and  the  precision  of  GLS,  BLP  and  BLUP  estimations 
or  predictions  is  affected  by  the  error  of  estimation  of  the  components  of  V. 

Selection  of  a  method  can  then  be  based  on  weighing  the  computational  complexity  and 
size  of  the  available  data  base  against  the  advantages  offered  by  each  method.  Thus,  if 
complexity  of  the  computational  problem  is  of  paramount  concern,  the  analyst  necessarily  would 
choose  OLS.  With  a  small  data  base  (one  that  does  not  allow  reasonable  estimates  of  variances), 
the  analyst  would  again  choose  OLS.  With  a  large  data  base  and  no  qualms  with  computational 
complexity,  the  analyst  can  choose  between  BLP  and  BLUP  based  on  whether  there  is  sufficient 
connectedness  or  imbalance  among  the  experiments  to  make  BLUP  advantageous. 

Conclusions 

Methods  of  solving  for  GCA  and  SCA  estimates  for  balanced  (plot-mean  basis)  and 
unbalanced  data  have  been  presented  along  with  the  inherent  assumptions  of  the  analysis.  The 
use  of  plot  means  and  the  matrix  equations  will  produce  sum-to-zero  OLS  estimates  for  GCA  and 
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SCA  for  all  types  of  imbalance.     Formulae  in  the  literature  which  yield  OLS  solutions  for 

balanced  data  can  yield  misleading  solutions  for  unbalanced  data  because  of  the  loss  of 

orthogonality  and  also  weightings  on  site  means  for  crosses  (or  totals)  are  constants. 

GCA's  and  SCA's  obtained  through  sum-to-zero  restriction  are  not  truly  estimates  of 
parametric  population  GCA's  and  SCA's.  There  are  an  infinite  number  of  solutions  for  GCA's 
and  SCA's  from  the  system  of  equations  as  a  result  of  the  overparameterized  linear  model.  Yet, 
if  the  only  comparisons  of  interest  are  among  the  specific  parents  on  a  particular  site,  then  the 
estimates  calculated  by  sum-to-zero  restrictions  are  appropriate.  Checklots  may  be  used  to 
provide  comparability  among  estimates  derived  from  disconnected  sets. 

Having  discussed  the  innate  mathematical  features  of  OLS  analysis,  knowledge  of  these 
features  should  help  the  data  analyst  decide  if  OLS  is  the  most  desirable  technique  for  the  data 
at  hand.  It  may  be  desirable  to  relax  OLS  assumptions,  which  are  in  all  likelihood  invalid  for 
the  covariance  matrix  of  the  observations.  This  could  lead  to  GLS,  BLP  or  BLUP  as  better 
alternatives. 


CHAPTER  4 

VARIANCE  COMPONENT  ESTIMATION  TECHNIQUES 

COMPARED  FOR  TWO  MATING  DESIGNS 

WITH  FOREST  GENETIC  ARCHITECTURE 

THROUGH  COMPUTER  SIMULATION 


Introduction 

In  many  applications  of  quantitative  genetics,  geneticists  are  commonly  faced  with  the 
analysis  of  data  containing  a  multitude  of  flaws  (e.g.  non-normality,  imbalance,  and 
heteroscedasticity).  Imbalance,  as  one  of  these  flaws,  is  intrinsic  to  quantitative  forest  genetics 
research  because  of  the  difficulty  in  making  crosses  for  full-sib  tests  and  the  biological  realities 
of  long  term  field  experiments.  Few  definitive  studies  have  been  conducted  to  establish  optimal 
methods  for  estimation  of  variance  components  from  unbalanced  data.  Simulation  studies  using 
simple  models  (one-way  or  two-way  random  models)  have  been  conducted  for  certain  data 
structures,  i.e.,  imbalance,  experimental  design,  and  variance  parameters  (Corbeil  and  Searle 
1976,  Swallow  1981,  Swallow  and  Monahan  1984,  interpretations  by  Littell  and  McCutchan 
1986).  The  results  from  these  studies  indicate  that  technique  optimality  is  a  function  of  the  data 
structure. 

In  practice  (both  historically  and  still  common  place),  estimation  of  variance  components 
in  forest  genetics  applications  has  been  achieved  by  using  sequentially  adjusted  sums  of  squares 
as  an  application  of  Henderson's  Method  3  (HM3,  Henderson  1953).  Under  normality  and  with 
balanced  data,  this  technique  has  the  desirable  properties  of  being  the  minimum  variance  unbiased 
estimator.    If  the  data  are  unbalanced,  then  the  only  property  retained  by  HM3  estimation  is 
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unbiasedness  (Searle  1971,  Searle  1987  pp.  492,493,498).    Other  estimators  have  been  shown 

to  be  locally  superior  to  HM3  in  variance  or  mean  square  error  properties  in  certain  cases  (Klotz 

etal.  1969,  Olsen  et  al.  1976,  Swallow  1981,  Swallow  and  Monahan  1984). 

Over  the  last  25  years,  there  has  been  a  proliferation  of  variance  component  estimation 
techniques  including  minimum  norm  quadratic  unbiased  estimation  (MINQUE,  Rao  1971a), 
minimum  variance  quadratic  unbiased  estimation  (MIVQUE,  Rao  1971b),  maximum  likelihood 
(ML,  Hartley  and  Rao  1967),  and  restricted  maximum  likelihood  (REML,  Patterson  and 
Thompson  1971).  The  practical  application  of  these  techniques  has  been  impeded  by  their 
computational  complexity.  However,  with  continuing  advances  in  computer  technology  and  the 
appearance  of  better  computational  algorithms,  the  application  of  these  procedures  continues  to 
become  more  tractable  (Harville  1977,  Geisbrecht  1983,  Meyer  1989).  Whether  these  methods 
of  analysis  are  superior  to  HM3  for  many  genetics  applications  remains  to  be  shown. 

With  balanced  data  and  disregarding  negative  estimates,  all  previously  mentioned 
techniques  except  ML  produce  the  same  estimates  (Harville  1977).  With  unbalanced  data,  each 
technique  produces  a  different  set  of  variance  component  estimates.  Criteria  must  then  be 
adopted  to  discriminate  among  techniques.  Candidate  criteria  for  discrimination  include 
unbiasedness  (large  number  convergence  on  the  parametric  value),  minimum  variance  (estimator 
with  the  smallest  sampling  variance),  minimum  mean  square  error  (minimum  of  sampling 
variance  plus  squared  bias,  Hogg  and  Craig  1978),  and  probability  of  nearness  (probability  that 
sample  estimates  occur  in  a  certain  interval  around  the  parametric  value,  Pitman  1937). 

Negative  estimates  are  also  problematic  in  the  estimation  of  variance  components.  Five 
alternatives  for  dealing  with  the  dilemma  of  estimates  less  than  zero  (outside  the  natural  parameter 
space  of  zero  to  infinity)  are  (Searle  1971):  1)  accept  and  use  the  negative  estimate,  2)  set  the 
negative  estimate  to  zero  (producing  biased  estimates),  3)  re-solve  the  system  with  the  offending 
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component  set  to  zero,  4)  use  an  algorithm  which  does  not  allow  negative  estimates,  and  5)  use 

the  negative  estimate  to  infer  that  the  wrong  model  was  utilized. 

The  purpose  of  this  research  was  to  determine  if  the  criteria  of  unbiasedness,  minimum 
variance,  minimum  mean  square  error,  and  probability  of  nearness  discriminated  among  several 
variance  component  estimation  techniques  while  exploring  various  alternatives  for  dealing  with 
negative  variance  component  estimates.  In  order  to  make  such  comparisons,  a  large  number  of 
data  sets  were  required  for  each  experimental  level.  Using  simulated  data,  this  chapter  compares 
variance  component  estimation  techniques  for  plot-mean  and  individual  observations,  two  mating 
systems  (modified  half-diallel  and  half-sib)  and  two  sets  of  parametric  variance  components. 
Types  of  imbalance  and  levels  of  factors  were  chosen  to  reflect  common  situations  in  forest 
genetics. 

Methods 

Experimental  Approach 

For  each  experimental  level  1000  data  sets  were  generated  and  analyzed  by  various 
techniques  (Table  4-1)  producing  numerous  sets  of  variance  component  estimates  for  each  data 
set.  This  workload  resulted  in  enormous  computational  time  being  associated  with  each 
experimental  level.  The  overall  experimental  design  for  the  simulation  was  originally  conceived 
as  a  factorial  with  two  types  of  mating  design  (half-diallel  and  half-sib),  two  sets  of  true  variance 
components  (Table  4-2),  two  kinds  of  observations  (individual  and  plot  mean)  and  three  types  of 
imbalance:  1)  survival  levels  (80%  and  60%,  with  80%  representing  moderate  survival  and  60% 
representing  poor  survival;  2)  for  full-sib  designs  three  levels  of  missing  crosses  (0,  2,  and  5  out 
of  15  crosses);  and  3)  for  half-sib  designs  two  levels  of  connectedness  among  tests  (15  and  10 
common  families  between  tests  out  of  15  families  per  test).   Because  of  the  computational  time 
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Table  4-1.  Abbreviation  for  and  description  of  variance  component  estimation  methods  utilized 
for  analyses  based  on  individual  observations  (if  utilized  for  plot-mean  analysis  the  abbreviation 
is  modified  by  pre-fixing  a  'P').  


Abbreviation 

Description 

Citation 

ML 
PML 

Maximum  Likelihood:  estimates  not  restricted  to  the  parameter 
space  (individual  and  plot-mean  analysis). 

Hartley  and  Rao  1967; 
Shaw  1987 

MODML 

Maximum  Likelihood:  negative  estimates  set  to  zero  after 
convergence  (individual  analysis). 

Hartley  and  Rao  1967 

NNML 

Maximum  Likelihood:  if  negative  estimates  appeared  at 
convergence,  they  were  set  to  zero  and  the  system  re-solved 
(individual  analysis). 

Hartley  and  Rao  1967; 
Miller  1973 

REML 
PREML 

Restricted  Maximum  Likelihood:  estimates  not  restricted  to  the 
parameter  space  (individual  and  plot-mean  analysis). 

Patterson  and 
Thompson  1 97 1 ;  Shaw 
1987;  Harville  1977 

MODREML 

Restricted  Maximum  Likelihood:  negative  estimates  set  to  zero 
after  convergence  (individual  analysis). 

Patterson  and 
Thompson  1971 

NNREML 
PNNREML 

Restricted  Maximum  Likelihood:  if  negative  estimates  appeared 
at  convergence,  they  were  set  to  zero  and  the  system  re-solved 
(individual  and  plot-mean  analysis). 

Patterson  and 
Thompson  1971;  Miller 
1983 

MIVQUE 
PMIVQUE 

Minimum  Variance  Quadratic  Unbiased:  non-iterative  with  true 
(parametric)  values  of  the  variance  components  as  priors 
(individual  and  plot-mean  analysis). 

Rao  1971b 

MINQUE1 
PMINQUE1 

Minimum  Norm  Quadratic  Unbiased:  non-iterative  with  ones  as 
priors  for  all  variance  components  (individual  and  plot-mean 
analysis). 

Rao  1971a 

TYPE3 
PTYPE3 

Sequentially  Adjusted  Sums  of  Squares;  Henderson's  Method  3 
(individual  and  plot-mean  analysis). 

Henderson  1953 

MIVPEN 

MIVQUE  with  a  penalty  algorithm  to  prevent  negative  estimates 
(individual  analysis). 

Harville  1977 

constraint,  the  experiment  could  not  be  run  as  a  complete  factorial  and  the  investigation  continued 
as  a  partial  factorial.  In  general,  the  approach  was  to  run  levels  which  were  at  opposite  ends  of 
the  imbalance  spectrum,  i.e.,  80%  survival  and  no  missing  crosses  versus  60%  survival  and  5 
missing  crosses,  within  a  variance  component  level.  If  results  were  consistent  across  these 
treatment  combinations,  intermediate  levels  were  not  run. 
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Designation  of  a  treatment  combination  is  by  five  character  alpha-numeric  field.  The  first 

character  is  either  "H"  (half-sib)  or  "D"  (half-diallel).  The  second  character  denotes  the  set  of 
parametric  variance  components  where  "  1 "  designated  the  set  of  variance  components  associated 
with  heritability  of  0.1  and  "2"  designated  the  set  of  variance  components  associated  with 
heritability  of  0.25  (Table  4-1).  The  third  character  is  an  "S"  indicating  that  the  last  two 
characters  determine  the  imbalance  level.  The  fourth  character  designates  the  survival  level  either 
"6"  for  60%  or  "8"  for  80%.  The  final  character  specifies  the  number  of  missing  crosses  (half- 
diallel)  or  lack  of  connectedness  (half-sib).  The  treatment  combination  'H1S80'  is  a  half-sib 
mating  design  (H),  the  set  of  variance  components  associated  with  heritability  equalling  0.1  (1), 
80%  survival  (8),  and  15  common  parents  across  tests  (0). 


Table  4-2.  Sets  of  true  variance  components  for  the  half-diallel  and  half-sib  mating  designs 
generated  from  specification  of  two  levels  of  single-tree  heritability  (h2),  type  B  correlation  (rB), 
and  non-additive  to  additive  variance  ratio  (d/a). 


Genetic  Ratios* 

Mating 
Design 

True  Variance  Components' 

h2 

rB 

d/a 

o? 

ot 

< 

o2 

< 

a2. 

t 

ol 

0.1 

0.5 

1.0 

full-sib 

1.0 

0.5 

0.25 

0.25 

0.25 

0.25 

.595 

7.905 

half -sib 

1.0 

0.5 

0.25 

NA 

0.25 

NA 

.475 

7.9964 

0.25 

0.8 

.25 

full-sib 

1.0 

0.5 

0.625 

.1562 

.1562 

.0391 

.5769 

7.6649 

a  h2  =  4a2,  /  <x2phenotypic;  rB  =  4o*g  /  (4a2g  +  40%);  and  0%  /  o»A  as  d/a  =  4c2,  /  4a2 
b  See  definitions  in  equation  4-1. 


Experimental  Design  for  Simulated  Data 


The  mating  design  for  the  simulation  was  either  a  six-parent  half-diallel  (no  selfs)  or  a 
fifteen-parent  half-sib.  The  randomized  complete  block  field  design  was  in  three  locations  {i.e., 
separate  field  tests)  with  four  complete  blocks  per  location  and  six  trees  per  family  in  a  block; 
where  family  is  a  full-sib  family  for  half-diallel  or  a  half-sib  family  for  the  half-sib  design.  This 
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field  design  and  the  mating  designs  reflect  typical  designs  in  forestry  applications  (Squillace  1973, 

Wilcox  et  al.  1975,  Bridgwater  et  al.  1983,  Weir  and  Goddard  1986,  Loo-Dinkins  et  al.  1991) 

and  are  also  commonly  used  in  other  disciplines  (Matzinger  et  al.  1959,  Hallauer  and  Miranda 

1981,  Singh  and  Singh  1984).    The  six  trees  per  family  could  be  considered  as  contiguous  or 

non-contiguous  plots  without  affecting  the  results  or  inferences. 

Full-Sib  Linear  Model 

The  scalar  linear  model  employed  for  half-diallel  individual  observations  is 
Vijkin,  =    M  +  t;  +  b:j  +  gk  +  g,  +  su  +  tgfc  +  tgu  +  tSw  +  pijkl  +  w;jklm  4-1 

where    yijldni  is  the  m-  observation  of  the  kl-  cross  in  the  j-  block  of  the  i-  test; 
ft  is  the  population  mean; 

t;  is  the  random  variable  test  location  ~  NID(0,cr2t); 
bjj  is  the  random  variable  block  ~  NIDCO,^); 

gk  is  the  random  variable  female  general  combining  ability  (gca)  ~  NID^o2^; 
g,  is  the  random  variable  male  gca  ~  NIDCO.o2^; 
su  is  the  random  variable  specific  combining  ability  (sea)  ~  NID^o2,,); 
tgfc  is  the  random  variable  test  by  female  gca  interaction  ~  NIDCO,^); 
tgu  is  the  random  variable  test  by  male  gca  interaction  ~  NID^cr,^; 
tSiu  is  the  random  variable  test  by  sea  interaction  ~  NID(0,a2J; 
pijkl  is  the  random  variable  plot  ~  NIDCO,^,,); 
wijkta  is  the  random  variable  within-plot  ~  NID(0,a2w);  and 
there  is  no  covariance  between  random  variables  in  the  model. 
This  linear  model  in  matrix  notation  is  (dimensions  below  model  component) 
y  =  fil  +  Zfj&p  +  A|(f|,  +  ^g^g  "■"    ^s^s  "*"    ^tg^tg  "■"    ^rseTs  "»    ZPeP  +  e^r       4-2 
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nxl      nxl   nxt  txl   nxb  hxl    nxg  gxl    n*s  sxl   nxtg  tgxl    nxts  t&xl    nxp  pxl   nxl 

where    y  is  the  observation  vector; 

Zj  is  the  portion  of  the  design  matrix  for  the  i—  random  variable; 

e,  is  the  vector  of  unobservable  random  effects  for  the  i—  random  variable; 

1  is  a  vector  of  l's;  and 

n,  t,  b,  g,  s,  tg,  ts,  and  p  are  the  number  of  observations,  tests,  blocks,  gca's,  sea's,  test 

by  gca  interactions,  test  by  sea  interactions  and  plots,  respectively. 
Utilizing  customary  assumptions  in  half-diallel  mating  designs  (Method  4,  Griffing  1956),  the 
variance  of  an  individual  observation  is 

Var(yijklJ  =  a2,  +  a2,  +  2c2,  +  a2,  +  2a2,,  +  a2*  +  a\  +  o*w;  4-3 

and  in  matrix  notation  the  covariance  matrix  for  the  observations  is 

Var(y)  =  Z^o2,  +  ZBZy„  +  ZGZ^g  +  ZgZJo2,  +  ^Z^o2*  +  Z^Z^^  +  ZpZfo2,  +  I.02.  4-4 

where  "  '  "  indicates  the  transpose  operator,  all  matrices  of  the  form  ZjZ,'  are  nxn,  and  I„  is  an 
nxn  identity  matrix. 

Half-sib  Linear  Model 

The  scalar  linear  model  for  half-sib  individual  observations  is 

yijto  =  V-  +  ti  +  b;j  +  gk  +  tfo  +  phijk  +  whijkm  4-5 

where    yijkm  is  the  m-  observation  of  the  k-  half-sib  family  in  the  j-  block  of  the  i-  test; 
ft,  tj,  b^,  gk,  and  t&.  retain  the  definition  in  Eq.4-1; 

phijk    is   the   random   variable   plot   containing   different   genotype   by    environment 
components  than  the  corresponding  term  in  Eq.4-1  ~  NID(0,a2ph); 
whijkm  is  the  random  variable  within-plot  containing  different  levels  of  genotypic  and 
genotype  by  environment  components  than  the  corresponding  term  in  Eq.4-1 
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~  NID(0,O;  and 
there  is  no  covariance  between  random  variables  in  the  model. 

The  matrix  notation  model  is  (dimensions  below  model  component) 

nxl      nxl    rut  Ucl   nxb  bjcl    nxg  gxl    nxtg  tgxl   nxp  pjcl    nxl 
The  variance  of  an  individual  observation  in  half-sib  designs  is 

Var(yijkJ  =  a\  +  a2,  +  (fis  +  a2^  +  a2^  +  o2^  4-7 

and        Var(y)  =  T^Z^a2,  +  ZBZ£a2b  +  ZGZ^g  +  ZtGZIgo\  +  ZpZPV2ph  +  Ino2wh  4-8 

For  an  observational  vector  based  on  plot  means,  the  plot  and  within-plot  random 
variables  were  combined  by  taking  the  arithmetic  mean  across  the  observations  within  a  plot. 
The  resulting  plot  means  model  has  a  new  a2f  or  a2^  (a2p,  or  a2ph.)  term  being  a  composite  of  the 
plot  and  within-plot  variance  terms  of  the  individual  observation  model. 

Three  estimates  of  ratios  among  variance  components  were  determined:  1)  single  tree 
heritability  adjusted  for  test  location  and  block  as  h2  =  4a2i  I  a2pbeaotyp;c  where  ff2Phenotypic  'S  the 
estimate  of  the  variance  of  an  individual  observation  from  equations  4-3  and  4-7  with  the  variance 
components  for  test  location  and  block  deleted;  2)  type  B  correlation  as  (rB  =  4a2g  I  (4^  + 
40^;  and  dominance  to  additive  variance  ratio  as  d/a  =  4ff28  /  4<x2g. 

Data  Generation  and  Deletion 

Data  generation  was  accomplished  by  using  a  Cholesky  upper-lower  decomposition  of  the 
covariance  matrix  for  the  observations  (Goodnight  1979)  and  a  vector  of  pseudo-random  standard 
normal  deviates  generated  using  the  Box-Muller  transformation  with  pseudo-random  uniform 
deviates  (Knuth  1981,  Press  et  al.  1989).  The  upper-lower  decomposition  creates  a  matrix  (U) 
with  the  property  that  Var(y)  =  U'U.    The  vector  of  pseudo-random  standard  normal  deviates 
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(z)  has  a  covariance  matrix  equal  to  an  identity  matrix  (I  J  where  n  is  the  number  of  observations. 

The  vector  of  observations  is  created  as  y  =  U'z.  Then  Var(y)  =  U'(Var(z))U  and  since  Var(z) 
=  I„,   Var(y)  =  U'lU  =  U'U. 

Analyses  of  survival  patterns  using  data  from  the  Cooperative  Forest  Genetic  Research 
Program  (CFGRP)  at  the  University  of  Florida  were  used  to  develop  survival  distributions  for 
the  simulation.  The  data  sets  chosen  for  survival  analysis  were  from  full-sib  slash  pine  (Pinus 
elliottii  var  elliottii  Engelm)  tests  planted  in  randomized  complete  block  designs  with  the  families 
in  row  plots  and  were  selected  because  the  survival  levels  were  either  approximately  60%  or 
80%.  Survival  levels  for  most  crosses  (full-sib  families)  clustered  around  the  expected  value, 
i.e.,  approximately  60%  for  an  average  survival  level  of  60%;  however,  there  were  always  a  few 
crosses  that  had  much  poorer  survival  than  average  and  also  a  small  number  of  crosses  that  had 
much  better  survival  than  average.  This  survival  pattern  was  consistent  across  the  50  experiments 
analyzed.  Thus,  a  lower  than  average  survival  level  was  arbitrarily  assigned  to  certain  crosses, 
a  higher  than  average  survival  level  was  assigned  to  certain  crosses,  and  the  average  survival 
level  assigned  to  most  crosses.  This  modeling  of  survival  pattern  was  also  extended  to  the  half- 
sib  mating  design.  At  80%  survival  no  missing  plots  were  allowed  and  at  60%  survival  missing 
plots  occurred  at  random. 

Full-sib  family  deletion  simulated  crosses  which  could  not  be  made  and  were  therefore 
missing  from  the  experiment.  When  deleting  five  crosses,  the  deletion  was  restricted  to  a 
maximum  of  four  crosses  per  parent  to  prevent  loss  of  all  the  crosses  in  which  a  single  parent 
appeared  since  this  would  have  resulted  in  changing  a  six-parent  to  a  five-parent  half-diallel. 

Tests  having  only  subsets  of  the  half-sib  families  in  common  are  a  frequent  occurrence 
in  data  analysis  at  CFGRP.  This  partial  connectedness  was  simulated  by  generating  data  in  which 
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only  10  of  the  15  families  present  in  a  test  were  common  to  either  one  of  the  other  two  tests 

comprising  a  data  set. 

Variance  Component  Estimation  Techniques 

Two  algorithms  were  utilized  for  all  estimation  techniques:  sequentially  adjusted  sums 
of  squares  (Milliken  and  Johnson  1984,  p  138)  for  HM3;  and  Giesbrecht's  algorithm  (Giesbrecht 
1983)  for  REML,  ML,  MINQUE  and  MIVQUE.  Giesbrecht's  algorithm  is  primarily  a  gradient 
algorithm  (the  method  of  scoring),  and  as  such  allows  negative  estimates  (Harville  1977, 
Giesbrecht  1983).  Negative  estimates  are  not  a  theoretical  difficulty  with  MINQUE  or  MIVQUE; 
however,  for  REML  and  ML,  estimates  should  be  confined  to  the  parameter  space.  For  this 
reason  estimators  referred  to  as  REML  and  ML  in  this  chapter  are  not  truly  REML  and  ML  when 
negative  estimates  occur;  further,  there  is  the  possibility  that  the  iterative  solution  stopped  at  a 
local  maxima  not  the  global  maximum.  These  concerns  are  commonplace  in  REML  and  ML 
estimation  (Corbeil  and  Searle  1976,  Harville  1977,  Swallow  and  Monahan  1984);  however, 
ignoring  these  two  points,  these  estimators  are  still  referred  to  as  REML  and  ML. 

The  basic  equation  for  variance  component  estimation  under  normality  (Giesbrecht  1983) 
for  MIVQUE,  MINQUE  and  REML  is  {^(QViQVpja2  =  {y'QV.Qy}  4-9 

txt  rjcl  txl 

then  S3  =  {tr(QViQVJ)}-VQV£y}; 

and  for  ML  {trCV^V'Vj)}*2  =  {y'QV.Qy}  4-10 

txt  rjcl  rjcl 

where  {tr(QVjQVj)}  is  a  matrix  whose  elements  are  tr(QViQVj)  where  in  the  full-sib 
designs  i=  1  to  8  and  j=l  to  8,  i.e.,  there  is  a  row  and  column  for 
every  random  variable  in  the  linear  model; 
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tr  is  the  trace  operator  that  is  the  sum  of  the  diagonal  elements  of  a  matrix; 

Q  =  v1  -  V'XCX'V-'XyX'V1    for  V  as  the  covariance  matrix  of  y  and  X  as 
the  design  matrix  for  fixed  effects; 

V,  =  Z,Z',  where  i  =   the  random  variables  test,  block,  etc.; 

P  is  the  vector  of  variance  component  estimates;  and 

r  is  the  number  of  random  variables  in  the  model. 
The  MINQUE  estimator  used  was  MINQUE1  ,  i.e.,  ones  as  priors  for  all  variance 
components;  calculated  by  applying  Giesbrecht's  algorithm  non-iteratively.  MINQUE  1  was 
chosen  because  of  results  demonstrating  MINQUEO  (prior  of  1  for  the  error  term  and  of  0  for 
all  others)  to  be  an  inferior  estimation  technique  for  many  cases  (Swallow  and  Monahan  1984, 
R.C.  Littell  unpublished  data). 

With  normally-distributed  uncorrelated  random  variables,  the  use  of  the  true  values  of 
the  variance  components  as  priors  in  a  non-iterative  application  of  Giesbrecht's  algorithm 
produced  the  MIVQUE  solutions  (equation  4-5).  Obtaining  true  MIVQUE  estimation  is  a  luxury 
of  computer  simulation  and  would  not  be  possible  in  practice  since  the  true  variance  components 
are  required  (Swallow  and  Searle  1978).  This  estimator  was  included  to  provide  a  standard  of 
comparison  for  other  estimators.  An  additional  MIVQUE-type  estimator,  referred  to  as 
MIVPEN,  was  also  included.  MIVPEN  was  also  a  non-iterative  application  of  the  algorithm  with 
the  true  variance  components  as  priors;  however,  this  estimator  was  conditioned  on  the  variance 
component  parameter  space  and  did  not  allow  negative  estimates.  The  non-negative  conditioning 
of  MIVPEN  was  accomplished  by  adding  a  penalty  algorithm  to  MIVQUE  such  that  no  variance 
component  was  allowed  to  be  less  than  lxlO"7.  Estimates  from  MIVPEN  were  equal  to  MIVQUE 
for  data  sets  for  which  there  were  no  negative  MIVQUE  variance  component  estimates.  When 
negative  MIVQUE  estimates  occur  the  two  techniques  were  no  longer  equivalent.   The  penalty 
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algorithm  operated  by  using  A  =  a2  -  o2  and  by  choosing  a  scalar  weight  w  such  that  no  element 

of  o2,^  is  less  than  lxlO"7.  Then  a2^  =  o2  +  wA,  where  A  is  the  vector  of  departure  from  the 

true  values  (o2),  lxlO"7  is  an  arbitrary  constant  and  a2^  is  the  vector  of  estimated  variance 

components  conditioned  on  non-negativity. 

REML  estimates  were  from  repeated  application  of  Giesbrecht's  algorithm  (equation  4-9) 
in  which  the  estimates  from  the  k*  iteration  become  the  priors  for  the  k+l111  iteration.  The 
iterations  were  stopped  when  the  difference  between  the  estimates  from  the  k"1  and  k+l* 
iterations  met  the  convergence  criterion;  then  the  estimates  of  the  k+l"1  iteration  became  the 
REML  estimates.  The  convergence  criterion  utilized  was  Ej=,  |  a2i(k)  -  a2^,,  |  <  lxlO"4.  This 
criterion  imposed  convergence  to  the  fourth  decimal  place  for  all  variance  components.  Since 
for  this  experimental  workload  it  was  desired  that  the  simulation  run  with  little  analyst 
intervention  and  in  as  few  iterations  as  possible,  the  robustness  of  REML  solutions  obtained  from 
Giesbrecht's  algorithm  to  priors  (or  starting  points)  was  explored.  The  difference  in  solutions 
starting  from  two  distinct  points  (a  vector  of  ones  and  the  true  values)  was  compared  over  2000 
data  sets  of  different  structures  (imbalance,  true  variance  components,  and  field  design).  The 
results  (agreeing  with  those  of  Swallow  and  Monahan  1984)  indicated  that  the  difference  between 
the  two  solutions  was  entirely  dependent  on  the  stringency  of  the  convergence  criterion  and  not 
on  the  starting  point  (priors).  Also  the  number  of  iterations  required  for  convergence  was  greatly 
decreased  by  using  the  true  values  as  priors.  Thus,  all  REML  estimates  were  calculated  starting 
with  the  true  values  as  priors. 

Three  alternatives  for  coping  with  negative  estimates  after  convergence  were  used  for 
REML  solutions:  accept  and  use  the  negative  estimates  (Shaw  1987),  arbitrarily  set  negative 
estimates  to  zero,  and  re-solve  the  system  setting  negative  estimates  to  zero  (Miller  1973).  The 
first  two  alternatives  are  self-explanatory  and  the  latter  is  accomplished  by  re-analyzing  those  data 
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sets  in  which  the  initial  unrestricted  REML  estimates  included  one  or  more  negative  estimates. 

During  re-analysis  if  a  variance  component  became  negative,  it  was  set  to  zero  (could  never  be 

any  value  other  than  zero)  and  the  iterations  continued.    This  procedure  persisted  until  the 

convergence  criterion  was  met  with  a  solution  in  which  all  variance  components  were  either 

positive  or  zero. 

Harville  (1977)  suggested  several  adaptations  of  Henderson's  mixed  model  equations 
(Henderson  et  al.  1959)  which  do  not  allow  variance  component  estimates  to  become  negative; 
however,  the  estimates  can  become  arbitrarily  close  to  zero.  After  trial  of  these  techniques 
versus  the  set  the  negative  estimates  to  zero  after  convergence  and  re-solve  the  system  approach, 
comparison  of  results  using  the  same  data  sets  indicates  that  there  is  little  practical  advantage 
(although  more  desirable  theoretically)  in  using  the  approach  suggested  by  Harville.  The 
differences  between  sets  of  estimates  obtained  by  the  two  methods  are  extremely  minor  (solving 
the  system  with  a  variance  component  set  to  zero  versus  arbitrarily  close  to  zero). 

ML  solutions,  as  iterative  applications  of  equation  4-6,  were  calculated  from  the  same 
starting  points  and  with  the  same  convergence  criterion  as  REML  solutions.  The  three  negative 
variance  component  alternatives  explored  for  ML  were  to  accept  and  use  the  negative  estimates, 
to  arbitrarily  set  negative  estimates  to  zero  after  converging  to  a  solution  for  the  former,  and  (for 
half-sib  data  only)  to  re-solve  the  system  setting  negative  variance  components  to  zero. 

The  algorithm  to  calculate  solutions  for  HM3  (sequentially  adjusted  sums  of  squares)  was 
based  on  the  upper  triangular  G2  sweep  (Goodnight  1979)  and  Hartley's  method  of  synthesis 
(Hartley  1967).  The  equation  solved  was  EjMSJo2  =  MS  where  MS  is  the  vector  of  mean 
squares  and  E{MS}  is  their  expectation.  The  alternative  used  for  negative  estimates  was  to  accept 
and  use  the  negative  estimates. 
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Comparison  Among  Estimation  Techniques 

For  the  simulation  MIVQUE  estimates  were  the  basis  for  all  comparisons  because 
MIVQUE  is  by  definition  the  minimum  variance  quadratic  unbiased  estimator.  The  results  of 
comparing  the  mean  of  1000  MIVQUE  estimates  for  an  experimental  level  to  the  means  for  other 
techniques  were  termed  "apparent  bias".  "Apparent  bias"  denotes  that  1000  data  sets  were  not 
sufficient  to  achieve  complete  convergence  to  the  true  values  of  the  variance  components. 

Sampling  variances  of  estimation  were  calculated  from  the  1000  observations  within  an 
experimental  level  and  estimation  technique  for  variance  components  and  genetic  ratios  (single 
tree  heritability,  Type  B  correlation  and  dominance  to  additive  variance  ratio).  Mean  square 
error  then  equalled  variance  plus  squared  "apparent  bias".  While  mean  square  error  was 
investigated,  there  was  never  sufficient  bias  for  mean  square  error  to  lead  to  a  different  decision 
concerning  techniques  than  sampling  variance  of  the  estimates;  so  mean  square  error  was  deleted 
from  the  remainder  of  this  discussion. 

Probability  of  nearness  is  the  probability  that  an  estimate  will  lie  within  a  certain  interval 
around  the  true  parameter.  The  three  total  interval  widths  utilized  were  one-half,  equal  to,  and 
twice  the  parameter  size.  The  percentage  of  1000  estimates  falling  within  these  intervals  were 
calculated  for  the  different  estimation  techniques  within  an  experimental  level  for  variance 
components  and  ratios  and  utilized  as  an  estimate  of  probability  of  nearness. 

Results  are  presented  by  variance  component  or  genetic  ratio  estimated  as  a  percentage 
of  MIVQUE  (except  in  the  case  of  probability  of  nearness).  MIVQUE  estimates  represent  100% 
with  estimates  with  greater  variance  having  values  larger  than  100%  and  "apparently  biased" 
estimates  having  values  different  from  100%.  The  percentages  were  calculated  as  equal  to  100 
times  the  estimate  divided  by  the  MIVQUE  value.   For  the  criterion  of  variance,  the  lower  the 
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percentage  the  better  the  estimator  performed;  for  bias,  values  equalling  100%  (0  bias)  are 

preferred;  and  for  probability  of  nearness,  larger  percentages  (probabilities)  are  favored  since 

they  are  indicative  of  greater  density  of  estimates  near  the  parametric  value. 

Results  and  Discussion 
Variance  Components 

Sampling  variance  of  the  estimators 

For  all  variance  components  estimated,  REML  and  ML  estimation  techniques  were 
consistently  equal  to  or  less  than  MIVQUE  for  sampling  variance  of  the  estimator  (Table  4-3). 
The  variance  among  estimates  from  these  techniques  was  further  reduced  by  setting  the  negative 
components  to  zero  (MODML  and  MODREML)  or  setting  negative  estimates  to  zero  plus  re- 
solving the  system  (NNREML,  NNML,  and  PNNREML).  Variance  among  MINQUE1  estimates 
is  always  equal  to  or  greater  than  for  MIVQUE,  as  one  might  expect,  since  they  are,  in  this 
application,  the  same  technique  with  MIVQUE  having  perfect  priors  (the  true  values).  Variances 
for  HM3  estimators  (TYPE3  and  PTYPE3)  are  either  equal  to  or  greater  than  MIVQUE  (HM3 
estimates  have  progressively  larger  relative  variance  with  higher  levels  of  imbalance.  MIVPEN, 
although  impractical  because  of  the  need  for  the  true  priors,  had  much  more  precise  estimates  of 
variance  components  than  other  techniques  illustrating  what  could  be  accomplished  given  the  true 
values  as  priors  plus  maintaining  estimates  within  the  parameter  space. 

In  general,  the  spread  among  the  percentages  for  variance  of  estimation  for  the  estimation 
techniques  is  highly  dependent  on  the  degree  of  imbalance  and  the  type  of  mating  system.  With 
increasing  imbalance  the  likelihood-based  estimators  realized  greater  advantage  for  sampling 
variance  of  the  estimates  over  HM3  for  both  mating  systems.  The  most  advantageous  application 
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Table  4-3.  Sampling  variance  for  the  estimates  of  d1i  (upper  number),  a2^  (second  number),  and 
h2  (third  number  where  calculated)  as  a  percentage  of  the  MIVQUE  estimate  by  type  of  estimator 
and  treatment  combination;  NA  is  not  applied.  Values  greater  than  100  indicate  larger  variance 
among  1000  estimates. 


Estimator 

D1S80 

D1S65 

D2S65 

H1S80 

H1S65 

REML 

99.9 
100.2 
100.0 

102.6 
100.0 
101.0 

101.5 
104.1 
101.4 

99.6 
99.7 
99.6 

106.3 
98.0 
105.8 

ML 

77.3 
106.9 
82.5 

78.2 
104.8 
82.9 

76.4 
110.7 
86.4 

95.9 
100.8 
96.2 

103.9 

99.1 

103.8 

MINQUE1 

100.0 
101.2 
100.3 

104.2 
118.8 
105.8 

104.0 
123.6 
103.9 

104.0 

112.5 
104.0 

146.7 
139.7 
145.8 

NNREML 

80.8 
67.9 
76.8 

71.6 
48.3 
64.2 

95.2 
54.9 
92.2 

88.0 
78.7 
87.3 

68.6 
48.6 
67.7 

NNML 

NA 

NA 

NA 

83.3 
79.4 
83.1 

65.3 
48.9 
64.7 

MODML 

58.2 
12.8 
58.1 

50.0 
81.4 
46.1 

69.5 
81.6 
72.0 

84.7 
86.6 
83.8 

74.6 
68.5 
71.4 

MODREML 

81.5 
89.1 
76.4 

74.5 
74.0 
63.5 

96.1 

73.7 
88.9 

88.9 
85.4 
87.7 

78.1 
66.9 

74.3 

TYPE3 

101.0 
101.1 
100.5 

101.0 
101.0 
108.4 

105.5 
115.5 
102.9 

100.6 
100.9 
100.4 

121.0 
125.6 
121.6 

PREML 

100.3 
102.7 

106.3 
113.5 

101.7 
119.8 

107.5 
122.0 

146.9 
150.7 

PML 

77.6 
109.7 

81.9 
117.3 

77.1 

127.2 

103.6 
123.3 

143.4 
151.9 

PMINQUE1 

100.3 
102.7 

107.6 
129.0 

105.4 
137.3 

107.5 
122.0 

179.3 
180.6 

PNNREML 

80.9 
69.8 

71.1 

53.2 

93.9 
60.5 

92.7 
94.0 

86.6 
68.1 

PTYPE3 

100.3 
102.7 
100.6 

106.6 
124.7 
110.8 

105.4 
133.3 
104.1 

107.5 
122.0 
106.9 

168.1 
184.9 
168.0 

MIVPEN 

NA 

36.2 
26.6 

34.7 

29.1 
20.0 
30.2 

80.0 
74.3 
79.8 

45.6 
39.6 

45.4 

PMIVQUE 

100.3 

102.7 

104.2 
114.4 

102.4 
117.8 

107.5 
122.0 

146.9 
150.7 
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of  likelihood-based  estimators  is  in  the  H1S65  case  where  the  imbalance  is  not  only  random 

deletions  of  individuals  but  also  incomplete  connectedness  across  locations,  i.e.  the  same  families 

are  not  present  in  each  test  (akin  to  incomplete  blocks  within  a  test). 

An  analysis  of  variance  was  conducted  to  determine  the  importance  of  the  treatment  of 
negative  variance  component  estimates  in  the  variance  of  estimation  for  REML  and  ML  estimates. 
The  model  of  sampling  variance  of  the  estimates  as  a  result  of  mating  design,  imbalance  level, 
treatment  of  negative  estimates  and  size  of  the  variance  component  demonstrated  consistently  (for 
all  variance  components  except  error)  that  treatment  of  negative  estimates  is  an  important 
component  of  the  variance  of  the  estimates  (p  <  .05).  The  model  accounted  for  up  to  99%  of 
the  variation  in  the  variance  of  the  variance  component  estimates  with  1)  accepting  and  using 
negative  estimates  producing  the  highest  variance;  2)  setting  the  negative  components  to  zero 
being  intermediate;  and  3)  re-solving  the  system  with  negative  estimates  set  to  zero  providing  the 
lowest  variance. 

For  all  estimation  techniques,  lower  variance  among  estimates  was  obtained  by  using 
individual  observations  as  compared  to  plot  means.   The  advantage  of  individual  over  plot-mean 
observations  increased  with  increasing  imbalance. 
Bias 

The  most  consistent  performance  for  bias  (Table  4-4)  across  all  variance  components  was 
TYPE3  known  from  inherent  properties  to  be  unbiased.  The  consistent  convergence  of  the 
TYPE3  value  to  the  MIVQUE  value  indicated  that  the  number  of  data  sets  used  (1000  per 
technique  and  experimental  level)  was  suitable  for  the  purpose  of  examining  bias.  The  other  two 
consistent  performers  were  REML  and  MINQUE1.  PTYPE3  (HM3  based  on  plot  means)  was 
unbiased  when  no  plot  means  were  missing,  but  produced  "apparently  biased"  estimates  when 
plot  means  were  missing. 
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Table  4-4.  Bias  for  the  estimates  of  a2g  (upper  number),  o2^,  (second  number),  and  h2  (third 
number  where  calculated)  as  a  percentage  of  the  MIVQUE  estimate  by  type  of  estimator  and 
experimental  combination;  NA  is  not  applied.  Values  different  from  100  denote  "apparent"  bias. 


Estimator 

D1S80 

D1S65 

D2S65 

H1S80 

H1S65 

REML 

99.9 
99.9 
99.9 

101.5 
102.2 
101.3 

98.7 
99.8 
98.6 

99.9 
99.9 
99.9 

102.8 
98.9 
102.6 

ML 

74.6 
106.5 
75.5 

61.6 
114.6 
61.8 

76.0 
109.7 
77.9 

96.2 
101.3 
96.3 

98.2 
101.8 
98.2 

MINQUE 

99.7 
100.1 
99.7 

96.4 
100.8 
96.6 

99.0 
101.3 
98.9 

99.4 
100.8 
99.4 

102.0 
98.3 
101.3 

NNREML 

107.9 
93.1 

108.7 

116.5 
92.9 
118.4 

98.1 
92.9 
98.2 

101.9 
100.5 
102.2 

107.8 
102.3 
107.7 

NNML 

NA 

NA 

NA 

101.9 
100.5 
98.2 

107.8 
102.3 
103.8 

MODML 

86.6 
109.9 
87.8 

90.4 
129.9 
91.5 

79.0 
127.4 
79.4 

98.1 
101.3 
99.6 

114.1 
122.9 
112.6 

MODREML 

109.5 
103.7 
109.5 

124.2 
119.8 
123.2 

100.6 
119.2 
98.4 

103.1 
104.6 
102.9 

117.8 
120.6 
116.2 

TYPE3 

100.1 
100.2 
100.0 

99.4 
101.0 
99.5 

99.6 
102.4 
99.3 

100.2 
100.2 
100.2 

99.6 
100.9 
99.7 

PREML 

99.7 
100.1 

98.7 
103.6 

97.7 
100.2 

99.5 
102.4 

110.6 
98.3 

PML 

74.2 
106.9 

58.5 
116.2 

73.6 
111.5 

95.9 
103.2 

105.2 
102.0 

PMINQUE 

99.7 
100.1 

95.2 
102.1 

98.8 
102.9 

99.5 
102.4 

106.5 
114.8 

PNNREML 

107.9 
92.9 

114.5 
94.0 

96.7 
95.0 

101.8 
104.5 

115.6 
110.2 

PTYPE3 

99.7 
100.1 
99.8 

96.8 
97.2 
98.0 

99.0 
96.0 
98.8 

99.5 
102.4 
99.6 

104.5 
108.7 
104.1 

MIVPEN 

NA 

107.5 
99.0 
112.6 

98.6 
91.7 
103.9 

102.0 
101.4 
102.1 

103.2 
105.1 
103.4 

PMIVQUE 

99.7 
100.1 

97.4 
101.7 

99.2 
100.5 

99.5 
102.4 

106.8 
98.8 
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Table  4-5.    Probability  of  nearness  for  a2^  (upper  number),  a\  (second  number),  and  h2  (third 
number  where  calculated).  The  probability  interval  is  equal  to  the  magnitude  of  the  parameter. 


Estimator 

D1S80 

D1S65 

D2S65 

H1S80 

H1S65 

REML 

32.8 
43.0 

34.2 

24.3 
26.2 

25.3 

41.8 

25.7 
45.4 

45.3 
36.6 
45.0 

28.6 
27.1 

28.3 

ML 

33.6 
42.9 
34.6 

22.3 
26.4 
22.3 

40.7 
24.8 
45.0 

45.4 
36.2 
45.7 

29.2 
26.7 
28.2 

MINQUE 

32.6 
43.1 

33.7 

24.6 
24.3 
25.0 

41.0 

25.4 
44.6 

45.1 

34.2 
44.7 

26.1 

23.2 
25.6 

NNREML 

33.4 
44.9 
34.3 

23.4 
28.1 

24.3 

41.7 
25.6 
46.1 

45.1 
38.0 
45.2 

29.3 
28.9 
29.5 

NNML 

NA 

NA 

NA 

45.9 
37.9 
46.0 

29.7 
29.1 
29.0 

TYPE3 

34.0 
42.6 
35.3 

23.2 
27.1 
23.8 

42.5 
24.8 
45.8 

45.3 
37.3 
45.9 

27.1 
25.0 

27.3 

PREML 

32.1 
42.7 

20.0 
26.8 

41.6 

24.6 

43.7 
32.3 

24.6 
20.4 

PML 

33.5 
41.0 

19.8 
26.3 

39.7 
23.6 

44.0 
31.6 

24.4 
21.1 

PMINQUE 

32.1 

42.7 

21.4 

24.8 

40.4 
23.1 

43.7 

32.3 

24.5 
21.9 

PNNREML 

31.9 

43.3 

19.2 
28.0 

41.0 

23.3 

43.4 
33.1 

26.0 
21.3 

PTYPE3 

32.1 
42.7 
32.6 

23.3 

25.4 
24.1 

41.7 
24.1 
46.0 

43.7 
32.3 
44.6 

25.2 
22.4 
24.6 

MIVQUE 

33.6 

42.9 
34.8 

25.7 
28.6 
26.8 

43.7 
26.4 
47.7 

45.1 
36.9 

45.4 

29.2 
26.3 
29.4 

MIVPEN 

NA 

41.1 
47.0 

42.4 

78.5 
60.3 
80.5 

48.4 
39.2 

48.7 

35.6 
31.2 

35.3 

PMIVQUE 

32.1 

42.7 

20.0 
28.5 

41.8 
26.8 

43.7 
32.3 

25.9 
20.8 
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Among  estimators  which  displayed  bias,  maximum  likelihood  estimators  (ML  and  PML) 

were  known  to  be  inherently  biased  (Harville  1977,  Searle  1987)  with  the  amount  of  bias 
proportional  to  the  number  of  degrees  of  freedom  for  a  factor  versus  the  number  of  levels  for  the 
factor.  Other  biases  resulted  from  the  method  of  dealing  with  negative  estimates.  Living  with 
negative  estimates  produced  the  estimators  with  the  least  bias.  Setting  negative  variance 
components  to  zero  resulted  in  the  greatest  bias.  Intermediate  in  bias  were  the  estimates  resulting 
from  re-solving  the  system  with  negative  components  set  to  zero. 
Probability  of  nearness 

Results  for  probability  of  nearness  proved  to  be  largely  non-discriminatory  among 
techniques  (Table  4-5).  The  low  levels  of  probability  density  near  the  parametric  values  are 
indicative  of  the  nature  of  the  variance  component  estimation  problem.  Figure  4-1  illustrates  the 
distribution  of  MIVQUE  variance  component  estimates  for  h2  (4- la)  and  crg  (4- lb)  for  level 
D1S80.  The  distributions  for  all  unconstrained  variance  component  estimates  have  the  appearance 
of  a  chi-square  distribution,  positively  skewed  with  the  expected  value  (mean)  occurring  to  the 
right  of  the  peak  probability  density  and  a  proportion  of  the  estimates  occurring  below  zero 
(except  error).  With  increasing  imbalance,  the  variance  among  estimates  increases  and  the 
probability  of  nearness  decreases  for  all  interval  widths. 

Ratios  of  Variance  Components 

Single  tree  heritability 

Results  for  estimates  of  single  tree  heritability  adjusted  for  locations  and  blocks  are  shown 
in  Tables  4-3  and  4-4  (third  number  from  the  top  in  each  cell,  if  calculated).  For  these  relatively 
low  heritabilities  (0. 1  and  0.25),  the  bias  and  variance  properties  of  the  estimated  ratio  are  similar 
to  those  for  a2,  estimates  (Figure  4-1).  This  implies  that  knowing  the  properties  of  the  numerator 
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Figure  4-1.  Distribution  of  1000  MIVQUE  estimates  of  h2  (4-la)  and  o2,  (4-lb)  for  experimental 
level  D1S80  illustrating  the  positive  skew  and  similarity  of  the  distributions.  The  true  values  are 
.  1  for  h2  and  .25   for  a2 .   The  interval  width  of  the  bars  is  one-half  the  parametric  value. 
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of  heritability  reveals  the  properties  of  the  ratio  (especially  true  of  ratios  with  expected  values  of 
0.1  and  0.25,  Kendall  and  Stuart  1963,  Ch.  10).  Variance  component  estimation  techniques 
which  performed  well  for  bias  and/or  variance  among  estimates  for  dlt  also  performed  well  for 
h2. 
Type  B  correlation  and  dominance  to  additive  variance  ratio 

Type  B  correlation  (Table  4-3  and  4-4  as  ff2^)  and  dominance  to  additive  variance  ratio 
(not  shown)  estimates  both  proved  to  be  too  unstable  (extremely  large  variance  among  estimates) 
in  their  original  formulations  to  be  useful  in  discrimination  among  variance  component  estimation 
techniques.  This  high  variance  is  due  to  the  estimates  of  the  denominators  of  these  ratios 
approaching  zero  and  to  the  high  variance  of  the  denominator  of  ratios  (Table  4-2).  These  ratios 
were  reformulated  with  numerators  of  interest  (4o\  for  additive  genetic  by  test  interaction  and 
4ff2,  for  dominance  variance,  respectively)  and  a  denominator  equal  to  the  estimate  of  the 
phenotypic  variance.  With  this  reformulation  the  variance  and  bias  properties  of  estimates  of  the 
altered  ratios  is  approximated  by  the  properties  of  estimates  of  the  numerators. 

For  increasing  imbalance  maximum-likelihood-based  estimation  offers  an  increasing 
advantage  over  HM3,  and  for  all  techniques  individual  observations  offer  increasing  advantage 
over  plot-mean  observations  for  variance  of  the  estimates  of  these  ratios.  Bias,  other  than 
inherently  biased  methods  (ML),  is  associated  with  the  probability  of  negative  estimates  which 
is  increased  by  increasing  imbalance.  This  assertion  is  supported  by  comparing  the  biases  of 
REML,  NNREML,  and  MODREML  estimates  across  imbalance  levels. 
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General  Discussion 

Observational  Unit 

Some  general  conclusions  regarding  the  choice  of  a  variance  component  estimation 
methodology  can  be  drawn  from  the  results  of  this  investigation.  For  any  degree  of  imbalance 
the  use  of  individual  observations  is  superior  to  the  use  of  plot  means  for  estimation  of  variance 
component  or  ratios  of  variance  components.  If  the  data  are  nearly  balanced  (close  to  100% 
survival  with  no  missing  plots,  crosses  (full-sib)  or  lack  of  connectedness  (half-sib)),  the 
properties  of  the  estimation  techniques  based  on  individual  and  plot-mean  observations  become 
similar;  so  if  departure  from  balance  is  nominal,  plot  means  can  be  used  effectively.  However, 
using  individual  observations  obviates  the  need  for  a  survey  of  imbalance  in  the  data  since 
individual  observations  produce  better  results  than  plot  means  for  any  of  the  estimation  techniques 
examined. 

Negative  Estimates 

Drawing  on  the  results  of  this  investigation,  the  discussion  of  practical  solutions  for  the 
negative  estimates  problem  will  revolve  around  two  solutions:  1)  accept  and  use  the  negative 
estimates;  and  2)  re-solving  the  system  with  negative  estimates  set  to  zero. 

Given  that  the  property  of  interest  is  the  true  value  of  a  variance  component  or  genetic 
ratio,  often  estimated  as  a  mean  across  data  sets,  then  negativity  constraints  come  into  play  if  the 
component  of  interest  is  small  in  comparison  to  other  underlying  variance  components  in  the  data, 
or  the  variance  of  estimates  is  high  due  to  an  inadequate  experimental  design  for  variance 
component  estimation.  These  factors  lead  to  an  increased  number  of  negative  estimates.  If  the 
data  structure  is  such  that  negative  estimates  would  occur  frequently,  then  accepting  negative 
estimates  is  a  good  alternative. 
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If  negative  estimates  tend  to  occur  infrequently  or  bias  is  of  less  concern  than  variance 

among  estimates,  then  re-solving  the  system  after  convergence  yields  negative  estimates  is  the 

preferable  solution.    This  tactic  reduces  both  bias  and  variance  among  estimates  below  that  of 

arbitrarily  setting  negative  estimates  to  zero. 

Estimation  Technique 

The  primary  competitors  among  estimation  techniques  that  are  practically  achievable  are 
REML  and  TYPE3  (HM3).  Both  techniques  produce  estimates  with  little  or  no  bias;  however, 
REML  estimates  for  the  most  part  have  slightly  less  sampling  variance  than  TYPE3  estimates. 
If  only  subsets  of  the  parents  are  in  common  across  tests  as  in  the  case  H1S65,  REML  has  a 
distinct  advantage  in  variance  among  estimates  over  TYPE3. 

REML  does  have  three  additional  advantages  over  TYPE3  which  are  1)  REML  offers 
generalized  least  squares  estimation  of  fixed  effects  while  TYPE3  offers  ordinary  least  squares 
estimation;  2)  Best  Linear  Unbiased  Predictions  (BLUP)  of  random  variables  are  inherent  in 
REML  solutions,  i.e.,  gca  predictions  are  available;  and  thus  in  solving  for  the  variance 
components  with  REML,  fixed  effects  are  estimated  and  random  variables  are  predicted 
simultaneously  (Harville  1977);  and  3)  REML  offers  greater  flexibility  in  the  model  specification 
both  in  univariate  and  multivariate  forms  as  well  as  heterogeneous  or  correlated  error  terms. 
Further,  although  the  likelihood  equations  for  common  REML  applications  are  based  on 
normality,  the  technique  has  been  shown  to  be  robust  against  the  underlying  distribution  (Westfall 
1987,  Banks  et  al.  1985). 


81 

Recommendation 

If  one  were  to  choose  a  single  variance  component  estimation  technique  from  among 
those  tested  which  could  be  applied  to  any  data  set  with  confidence  that  the  estimates  had 
desirable  properties  (variance,  MSE,  and  bias),  that  technique  would  be  REML  and  the  basic  unit 
of  observation  would  be  the  individual.  This  combination  (REML  plus  individual  observations) 
performed  well  across  mating  design  and  types  and  levels  of  imbalance.  Treatment  of  negative 
estimates  would  be  determined  by  the  proposed  use  of  the  estimates  that  is  whether  unbiasedness 
(accepting  and  using  the  negative  estimates)  is  more  important  than  sampling  variance  (re-solve 
the  system  setting  negative  estimates  to  zero). 

A  primary  disadvantage  of  REML  and  individual  observations  is  that  they  are  both 
computationally  expensive  (computer  memory  and  time).  HM3  estimation  could  replace  REML 
on  many  data  sets  and  plot  means  could  replace  individual  observations  on  some  data  sets;  but 
general  application  of  these  without  regard  to  the  data  at  hand  does  result  in  a  loss  in  desirable 
properties  of  the  estimates  in  many  instances. 

The  computational  expense  of  REML  and  individual  observations  ensures  that  estimates 
have  desirable  properties  for  a  broad  scope  of  applications.  With  the  advent  of  bigger  and  faster 
computers  and  the  evolution  of  better  REML  algorithms,  what  was  not  feasible  in  the  past  on 
most  mainframe  computers  can  now  be  accomplished  on  personal  computers. 


CHAPTER  5 

GAREML:  A  COMPUTER  ALGORITHM  FOR 

ESTIMATING  VARIANCE  COMPONENTS  AND 

PREDICTING  GENETIC  VALUES 


Introduction 

The  computer  program  described  in  this  chapter,  called  GAREML  for  Giesbrecht's 
algorithm  of  restricted  maximum  likelihood  estimation  (REML),  is  useful  for  both  estimating 
variance  components  and  predicting  genetic  values.  GAREML  applies  the  methodology  of 
Giesbrecht  (1983)  to  the  problems  of  REML  estimation  (Patterson  and  Thompson  1971)  and  best 
linear  unbiased  prediction  (BLUP,  Henderson  1973)  for  univariate  (single  trait)  genetics  models. 
GAREML  can  be  applied  to  half-sib  (open-pollinated  or  polymix)  and  full-sib  (partial  diallels, 
factorials,  half-diallels  [no  selfs]  or  disconnected  sets  of  half-diallels)  mating  designs  when  planted 
in  single  or  multiple  locations  with  single  or  multiple  replications  per  location.  When  used  for 
variance  component  estimation,  this  program  has  been  shown  to  provide  estimates  with  desirable 
properties  across  types  of  imbalance  commonly  encountered  in  forest  genetics  field  tests  (Huber 
et  al.  in  press)  and  with  varying  underlying  distributions  (Banks  et  al.  1985,  Westfall  1987). 
GAREML  is  also  useful  for  determining  efficiencies  of  alternative  field  and  mating  designs  for 
the  estimation  of  variance  components. 

Utilizing  the  power  of  mixed-model  methodology  (Henderson  1984),  GAREML  provides 
BLUP  of  parental  general  (gca)  and  specific  combining  abilities  (sea)  as  well  as  generalized  least 
squares  (GLS)  solutions  for  fixed  effects.  The  application  of  BLUP  to  forest  genetics  problems 
has  been  addressed  by  White  and  Hodge  (1988,  1989).   With  certain  assumptions,  the  desirable 
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properties  of  BLUP  predictions  include  maximizing  the  probability  of  obtaining  correct  parental 

rankings  from  the  data  and  minimizing  the  error  associated  with  using  the  parental  values 

obtained  in  future  applications.  GLS  fixed  effect  estimation  weights  the  observations  comprising 

the  estimates  by  their  associated  variances  approximating  best  linear  unbiased  estimation  (BLUE) 

for  fixed  effects  (Searle  1987,  p  489-490). 

The  purpose  of  this  chapter  is  to  describe  the  theory  and  use  of  GAREML  in  enough 

detail  to  facilitate  use  by  other  investigators.   The  program  is  written  in  FORTRAN  and  is  not 

dependent  on  other  analysis  programs.    An  interactive  version  of  this  program  can  be  obtained 

as  a  stand-alone  executable  file  from  the  senior  author;  this  file  will  run  on  any  IBM  compatible 

PC  under  DOS  or  WINDOWS2  operating  systems.   The  size  of  the  problem  an  investigator  can 

solve  will  be  dependent  on  the  amount  of  extended  memory  and  hard  disk  space  (for  swap  files) 

available  for  program  use.  In  addition,  the  FORTRAN  source  code  can  be  obtained  for  analysts 

wishing  to  compile  the  program  for  use  on  alternate  systems  (e.g.  mainframe  computers). 

Algorithm 

GAREML  proceeds  by  reading  the  data  and  forming  a  design  matrix  based  on  the  number 
of  levels  of  factors  in  the  model.  Any  portions  of  the  design  matrix  for  nested  factors  or 
interactions  are  formed  by  horizontal  direct  product.  Columns  of  zeroes  in  the  design  matrix  (the 
result  of  imbalance)  are  then  deleted.  The  design  matrix  columns  are  in  an  order  specified  by 
Giesbrecht's  algorithm:  columns  for  fixed  effects  are  first,  followed  by  the  data  vector,  and  the 
last  section  of  the  matrix  is  for  random  effects.  The  design  matrix  is  the  only  fully  formed 
matrix  in  the  program.  All  other  matrices  are  symmetric;  therefore,  to  save  computational  space 
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and  time,  only  the  diagonal  and  the  above  diagonal  portions  of  matrices  are  formed  and  utilized 
(i.e.,  half-stored). 

A  half-stored  matrix  of  the  dot  products  of  the  design  columns  is  formed  and  either  kept 
in  common  memory  or  stored  in  temporary  disk  space  so  that  the  matrix  is  available  for  recall 
in  the  iterative  solution  process.  The  algorithm  proceeds  by  modifying  the  matrix  of  dot  products 
such  that  the  inverse  of  the  covariance  matrix  for  the  observations  (V)  is  enclosed  by  the  column 
specifiers  in  the  dot  products  as  X'X  becoming  X'V'X.  This  transfer  is  completed  without 
inversion  of  the  total  V  matrix.  The  identity  used  to  accomplish  this  transfer  is 
if  Vh  =  a^Zh'  +  V^+d  where  Vh  is  nonsingular; 

then  V'h  =  \\+1)  -  abV-V„Zh(Ih  +  «bZh'V-1(h+I)Zh)'Zh'V-1(h+1).  5-1 

A  compact  form  of  equation  5-1  is  obtained  by  pre-multiplying  by  Z,'  and  post-multiplying  by 
Zj  where  h  =  1,  k-1  (k  =  the  total  number  of  random  factors),  ab  is  the  prior  associated  with 
random  variable  h,  Vk  =  akI,  V,  =  V  and  Z(  is  the  portion  of  the  design  matrix  for  random 
variable  i  (Giesbrecht  1983).  A  partitioned  matrix  is  formed  in  order  to  update  VV+d  until  V,'1 
or  V  is  obtained.   This  matrix  is  of  the  form: 

Ih  +  ahZhV(h+1)-'Zh  v^hZh'V(h+1)-1(X!y!Z1!...|Zk.1) 

^h(X|y|Z1|...|Zk.,)V(h+1)-'Zh  T^,, 

where  Tk.,  =  (X|y|Zj...|ZlJ'VM-1(X|y|ZI|...|Zk_1). 

The  sweep  operator  of  Goodnight  (1979)  is  applied  to  the  upper  left  partition  of  the 
matrix  (equation  5-2)  and  the  result  of  equation  5-1  is  obtained.  The  matrix  is  sequentially 
updated  and  swept  until  T,  =  (X  |  y  |  Z,  | ...  |  Zk.,)'V'(X  j  y  j  Z,  | ...  |  Zk.,)  is  obtained.  T,  is  then 
swept  on  the  columns  for  fixed  effects  (X'V'X).  This  sweep  operation  produces  generalized  least 
squares  estimates  for  fixed  effects,  results  which  can  be  scaled  into  predictions  of  random 
variables,  the  residual  sum  of  squares  and  all  the  necessary  ingredients  for  assembling  the 
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equation  to  solve  for  the  variance  components.    The  equation  to  be  solved  for  the  variance 

components  is 

{trCQV&V^a2  =  {y'QV&y} 

txt  rxl  rjcl 

then  P  =  {tr(QVlQVJ)}1{y'QV,Qy};  5-3 

where    {tr(QV,QVj)}  is  a  matrix  whose  elements  are  tr(QV,QVj)  where  i=  1  to  r  and 

j  =  l  to  r,  i.e.,  there  is  a  row  and  column  for  every  random  variable  in 

the  linear  model; 

tr  is  the  trace  operator  that  is  the  sum  of  the  diagonal  elements  of  a  matrix; 

Q  =  V"1  -  V'XCX'V-'XyX'V"1    for  V  as  the  covariance  matrix  of  y  and  X  as 

the  design  matrix  for  fixed  effects; 

V,  =  ZjZ'i  where  the  i's  are  the  random  variables; 

a2  is  the  vector  of  variance  component  estimates;  and 

r  is  the  number  of  random  variables  in  the  model  (k-1). 

The  entire  procedure  from  forming  T,  to  solving  for  the  variance  components  continues 

until  the  variance  component  estimates  from  the  last  iteration  are  no  more  different  from  the 

estimates  of  the  previous  iteration  than  the  convergence  criterion  specifies.    The  fixed  effect 

estimates  and  predictions  of  random  variables  are  then  those  of  the  final  iteration.     The 

asymptotic  covariance  matrix  for  the  variance  components  is  obtained  as 

Var(^)  =  2{tr(QViQVj)}"1  5^ 

by  utilizing  intermediate  results  from  the  solution  for  the  variance  components. 

The  coefficient  matrix  of  Henderson's  mixed  model  equations  is  formed  in  order  to 

calculate  the  covariance  matrix  for  fixed  and  random  effects.     The  covariance  matrix  for 
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observations  is  constructed  using  the  variance  components  estimates  from  Giesbrecht's  algorithm. 

The  coefficient  matrix  is 

I"  X'R-'X 

L  Z'R'X  Z' 

where  R    is    the    error    covariance    matrix    which    in    this    application    is    la2,, 

where  a2w  is  the  variance  of  random  variable  w  (equation  5-6  and  5-7); 

X  is  the  fixed  effects  design  matrix; 

Z  is  the  random  effects  design  matrix;  and 

D    is    the    covariance    matrix    for    the    random    variables    which,    in    this 

application,  has  variance  components  on  the  diagonal  and  zeroes  on  the 

off-diagonal  (no  covariance  among  random  variables). 

The  generalized  inverse  of  the  matrix  (equation  5-5)  is  the  error  covariance  matrix  of  the  fixed 

effect  estimates  and  random  predictions  assuming  the  covariance  matrix  for  observation  is  known 

without  error. 

Operating  GAREML 

While  GAREML  will  run  in  either  batch  or  interactive  mode,  we  focus  on  the  interactive 
PC-version  which  begins  by  prompting  the  analyst  to  answer  questions  determining  the  factors 
to  be  read  from  the  data.  Specifically,  the  analyst  answers  yes  or  no  to  these  questions:  1)  are 
there  multiple  locations?  2)  are  there  multiple  blocks?  3)  are  there  disconnected  sets  of  full-sibs? 
i.e.,  usually  referring  to  disconnected  half-diallels  and  4)  is  the  mating  design  half-sib  or  full-sib? 
The  program  then  determines  the  proper  variables  to  read  from  the  data  as  well  as  the  most 
complicated  (number  of  main  factors  plus  interactions)  scalar  linear  model  allowed. 

The  most  complicated  linear  model  allowed  for  full-sib  observations  is 
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y,jkim  =    M  +  tj  +  bB  +  set„  +  gk  +  g,  +  Su  +  tgfr  +  tga  +  tea  +  pijkl  +  wijklm  5-6 

where  yijklm  is  the  m-  observation  of  the  kl-  cross  in  the  j-  block  of  the  i-  test; 

H  is  the  population  mean; 

t;  is  the  random  or  fixed  variable  test  environment; 
by  is  the  random  or  fixed  variable  block; 

set„  is  the  random  or  fixed  variable  set,  i.e.,  a  variable  is  created  so  that 
disconnected  sets  of  half-diallels  planted  in  the  same  experiment  can  be 
analyzed  in  the  same  run  or  to  analyze  provenances  and  families  within 
provenance  where  provenance  equals  set;  sets  are  assumed  to  be  across  test 
environments  and  blocks  with  families  nested  within  sets  and  interactions  with 
set  are  assumed  unimportant. 
gk  is  the  random  variable  female  general  combining  ability  (gca); 
g,  is  the  random  variable  male  gca; 
Sy  is  the  random  variable  specific  combining  ability  (sea); 
tgk  is  the  random  variable  test  by  female  gca  interaction; 
tgu  is  the  random  variable  test  by  male  gca  interaction; 
tSaj  is  the  random  variable  test  by  sea  interaction; 
Pijki  is  the  random  variable  plot; 
wijklm  is  the  random  variable  within-plot;  and 
there  is  no  covariance  between  random  variables  in  the  model. 
The  assumptions  utilized  are  the  variance  for  female  and  male  random  variables  are  equal  (a2^ 
=  a2^  =  o2^;  and  female  and  male  environmental  interactions  are  the  same  (o2^  =  a2^  =  o2^). 
The  most  complicated  scalar  linear  model  allowed  for  half-sib  observations  is 

Vijkn,  =  V-  +  t,  +  by  +  set0  +  gk  +  t&  +  pV  +  whijkm  5-7 
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where    yijkni  is  the  m-  observation  of  the  k-  half-sib  family  in  the  j-  block  of  the  i-  test; 
li,  tj,  b;j,  set0,  gk,  and  tg^  retain  the  definition  in  the  full-sib  equation; 
phijk  is  the  random  variable  plot  containing  different  genotype  by  environment 

components  than  the  full-sib  model; 
whijkin  is  the  random  variable  within-plot  containing  different  levels  of 

genotypic  and  genotype  by  environment  components  than  the  full-sib  model; 
and  there  is  no  covariance  between  random  variables  in  the  model. 
The  analyst  builds  the  linear  model  by  answering  further  prompts.  If  test,  block  and/or 
set  are  in  the  model,  they  must  be  declared  as  fixed  or  random  effects.  When  any  of  the  three 
effects  is  declared  random,  the  analyst  must  furnish  prior  values  for  the  variance.  If  no  prior 
value  is  known,  1.0's  may  be  used  as  priors.  Using  1.0's  as  priors  will  not  affect  the  values  for 
resulting  variance  component  estimates  within  the  constraints  of  the  convergence  criterion;  but 
there  may  be  a  time  penalty  due  to  increasing  the  number  of  iterations  required  for  convergence. 
All  remaining  factors  in  the  model  are  treated  as  random  variables. 

To  complete  the  definition  of  the  model,  the  analyst  chooses  to  include  or  exclude  each 
possible  factor  by  answering  yes  or  no  when  prompted.  After  each  yes  answer,  the  program  asks 
for  a  prior  value  for  the  variance.  Again,  if  no  known  priors  exist,  1.0's  may  be  substituted. 
After  the  model  has  been  specified,  the  program  counts  the  number  of  fixed  effects  and  the 
number  of  random  effects  and  asks  if  the  number  fits  the  model  expected.  A  "yes"  answer 
proceeds  through  the  program  while  a  "no"  returns  the  program  to  the  beginning. 

GAREML  is  now  ready  to  read  the  data  file  (which  must  be  an  ASCII  data  file)  in  this 
order:  test,  block,  set,  female,  male,  and  the  response  variable.  The  analyst  is  prompted  to 
furnish  a  proper  FORTRAN  format  statement  for  the  data.  Test,  block,  set,  female  and  male  are 
read  as  character  variables  (A  fields)  with  as  many  as  eight  characters  per  field,  while  the  data 
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vector  (response  variable)  is  read  as  a  double  precision  variable  (F  field).  An  example  of  a 
format  statement  for  a  full-sib  mating  design  across  locations  and  blocks  is  "(4A8,F10.5)"  which 
reads  four  character  variables  sequentially  occupying  8  columns  each  and  the  reponse  variable 
beginning  in  column  33  and  ending  in  column  42  having  five  decimal  places. 

After  reading  the  data,  GAREML  begins  to  furnish  information  to  the  analyst.  This 
information  should  be  scanned  to  make  sure  the  data  read  are  correct.  This  information  includes 
the  number  of  parents,  the  number  of  full-sib  crosses,  the  number  of  observations,  the  maximum 
number  of  fixed  effect  design  matrix  columns,  and  the  maximum  number  of  random  effect  design 
matrix  columns.  If  there  is  an  error  at  this  point,  use  CTRL-BRK  to  exit  the  program.  Probable 
causes  of  errors  are  the  data  are  not  in  the  format  specified,  missing  values  are  included,  blank 
lines  or  other  similar  errors  are  in  the  data  file,  or  the  model  was  not  correctly  specified. 

At  this  point,  there  are  three  other  prompts  concerning  the  data  analysis  (number  of 
iterations,  convergence  criterion  and  treatment  of  negative  variance  components).  The  number 
of  iterations  is  arbitrarily  set  to  30  and  can  be  changed  at  the  analyst's  discretion.  No  warning 
is  issued  that  the  maximum  number  of  iterations  has  been  reached;  however,  the  current  iteration 
number  and  variance  component  estimates  are  output  to  the  screen  at  the  beginning  of  each 
iteration.  The  convergence  criterion  used  is  the  sum  of  the  absolute  values  of  the  difference 
between  variance  component  estimates  for  consecutive  iterations.  The  criterion  has  been  set  to 
lxlO"4  meaning  that  convergence  is  required  to  the  fourth  decimal  place  for  all  variance 
components.  The  convergence  criterion  should  be  modified  to  suit  the  magnitude  of  the  variances 
under  consideration  as  well  as  the  practical  need  for  enhanced  resolution.  Enhanced  resolution 
is  obtained  at  the  cost  of  increasing  the  number  of  iterations  to  convergence. 

The  analyst  must  decide  whether  to  accept  and  use  negative  estimates  or  to  set  negative 
estimates  to  zero  and  re-solve  the  system.    The  latter  solution  results  in  variance  component 
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estimates  with  lower  sampling  variance  and  slight  bias.  If  one  is  interested  in  unbiased  estimates 
of  variance  components  that  have  a  high  probability  of  negative  estimates,  then  accepting  and 
using  the  negative  estimates  may  be  the  proper  course  to  take. 

Interpreting  GAREML  Output 

Analysis  is  now  underway.  The  priors  for  each  iteration  and  the  iteration  number  are 
printed  out  to  the  screen.  GAREML  continues  to  iterate  until  the  convergence  criterion  is  met 
or  the  maximum  number  of  iterations  is  reached.  The  next  time  that  analyst  intervention  is 
required  is  to  provide  a  name  for  the  output  file  for  variance  component  estimates.  The  file  name 
follows  normal  DOS  file  naming  protocol;  however,  alternative  directories  may  not  be  specified, 
i.e.,  all  outputs  will  be  found  in  the  same  directory  as  the  data  file.  The  program  will  now  quiz 
the  analyst  to  determine  if  additional  outputs  are  desired.  These  additional  outputs  are  gca 
predictions,  sea  predictions  (if  applicable),  the  asymptotic  covariance  matrix  for  the  variance 
components,  generalized  least  squares  fixed  effect  estimates,  error  covariance  matrix  of  the  gca 
predictions  and  error  covariance  matrix  for  fixed  effects.  An  answer  of  yes  to  the  inclusion  of 
an  output  will  result  in  a  prompting  for  a  file  name.  In  addition,  for  gca  and  sea  predictions  the 
analyst  may  input  a  different  value  for  a2gca  or  cr^  with  which  to  scale  predictions.  The 
discussion  which  follows  furnishes  more  detailed  information  concerning  GAREML  outputs. 

Variance  Component  Estimates 

Ignoring  concerns  about  convergence  to  a  global  maximum  and  negative  values,  variance 
component  estimates  are  restricted  maximum  likelihood  estimates  of  Patterson  and  Thompson 
(1971).  The  estimates  are  robust  against  starting  values  (priors),  i.e.,  the  same  estimates,  within 
the  limits  of  the  convergence  criterion,  can  be  obtained  from  diverse  priors.    However,  priors 
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close  to  the  true  values  will,  in  general,  reduce  the  number  of  iterations  required  to  reach 

convergence.    The  value  of  the  convergence  criterion  must  be  less  than  or  equal  to  the  desired 

precision  for  the  variance  components.  REML  variance  component  estimates  from  this  program 

have  been  shown  to  have  more  desirable  properties  (variance  and  bias)  than  other  commonly  used 

estimation  techniques  (maximum  likelihood,  minimum  norm  quadratic  unbiased  estimation  and 

Henderson's  Method  3)  over  a  wide  range  of  data  imbalance.  The  properties  of  the  estimates  are 

further  enhanced  by  using  individual  observations  as  data  rather  than  plot  means.   The  output  is 

labelled  by  the  variance  component  estimated. 

Predictions  of  Random  Variables 

The  predictions  output  are  for  general  and  specific  combining  abilities  and  approximate 
best  linear  unbiased  predictions  (BLUP)  of  the  random  variables.  BLUP  predictions  have  several 
optimal  properties:  1)  the  correlation  between  the  predicted  and  true  values  is  maximized;  2)  if 
the  distribution  is  multivariate  normal  then  BLUP  maximizes  the  probability  of  obtaining  the 
correct  rankings  (Henderson  1973)  and  so  maximizes  the  probability  of  selecting  the  best 
candidate  from  any  pair  of  candidates  (Henderson  1977). 

Predictions  are  of  the  form: 

u  =  DZ'V'O'-Xfi)  5-8 

where    u  is  the  vector  of  predictions; 

D    is   the    estimated    covariance    matrix    for    random    variables    from    the   REML 
variance  component  estimates,  see  equation  5-5; 

Z'  is  the  transpose  of  the  design  matrix  for  random  variables; 

y  is  the  data  vector; 

X  is  the  design  matrix  for  fixed  effects; 
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6  is  the  vector  of  fixed  effect  estimates;  and 

V    is    the    estimated    covariance    matrix    for    observations    from    REML    variance 
component  estimates. 
NOTE:  if  predictions  are  desired  based  on  prior  values  for  the  variance  components,  set  the 
number  of  iterations  to  1  after  having  input  the  desired  values  as  priors. 
Predictions  are  output  as  a  labelled  vector. 

Asymptotic  Covariance  Matrix  of  Variance  Components 

The  output  for  the  asymptotic  covariance  matrix  (AVCM)  of  variance  components  is  from 
equation  5-4.  This  output  represents  the  variance  of  repeated  minimum  variance  quadratic 
unbiased  variance  component  estimates  using  the  same  experimental  design  if  the  estimates  are 
equal  to  the  true  values. 

This  technique  has  been  used  for  simulation  work  to  define  optimal  mating  and  field 
designs  (McCutchan  et  al.  1989).  The  AVCM  is  used  to  create  the  asymptotic  variance  of  linear 
combinations  of  estimates  of  variance  components  as 

Var(L'^)  =  L'Var(^)L  5-9 

where    L  specifies  the  linear  combination(s)  of  variance  components; 

a2  is  the  vector  of  variance  component  estimates;  and 

VarCo2)  is  the  AVCM  from  equation  5-4. 
The  diagonal  elements  of  L'Var^L  are  the  variances  of  the  linear  combinations  and  the  off- 
diagonal  elements  are  the  covariances  between  the  linear  combinations.  These  values  are  then 
useful  for  Taylor  series  approximation  of  the  variance  of  a  ratio  of  linear  combinations  such  as 
heritability.  AVCM  is  output  as  a  vector  (half-stored  matrix)  and  each  row  of  the  output  is 
labelled. 
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Fixed  Effect  Estimates 

Fixed  effect  estimates  are  those  of  generalized  least  squares  and  are  in  a  set  to  zero 
format.  Set  to  zero  format  (commonly  seen  in  SAS3  output)  is  characterized  by  the  last  level  of 
a  main  effect  or  nested  effect  being  set  to  zero.  These  estimates  are  approximately  best  linear 
unbiased  estimates  (BLUE)  of  the  fixed  effects  because  the  covariance  matrix  for  observations 
was  estimated  and  not  known  without  error.  Kackar  and  Harville  (1981)  have  shown,  for  a  broad 
class  of  variance  estimators,  that  the  fixed  effects  estimates  are  still  unbiased.  The  word  "Best" 
in  BLUE  refers  to  the  properties  of  minimum  variance  for  the  class  of  unbiased  estimators. 

Generalized  least  squares  estimates,  in  set  to  zero  format,  for  fixed  effects  are  of  the 
form: 

6  =  (X'V'XyX'V'y  5-10 

where    6,  X,  V  and  y  are  as  defined  in  equation  5-8. 
Fixed  effect  estimates  are  output  as  a  labelled  vector. 

Error  Covariance  Matrices 

The  error  covariance  matrices  for  predictions  and  fixed  effect  estimates  are  obtained  by 
producing  a  generalized  inverse  of  equation  5-5  (Henderson  1984,  McLean  1989).  Since  all 
covariance  matrices  are  symmetric,  the  output  is  in  the  form  of  a  vector  which  is  equivalent  to 
a  half-stored  matrix.  Output  for  error  of  gca  predictions  is  labeled  while  the  error  of  fixed  effects 
is  not.  The  labeling  on  gca  errors  makes  the  unlabelled  output  for  fixed  effect  variance  self- 
explanatory.  The  error  covariance  matrix  for  gca  predictions  can  be  converted  to  the  covariance 
matrix  for  gca  predictions  by  forming  the  covariance  matrix  for  the  gca  random  variables  and 


3SAS  is  the  registered  trademark  of  SAS  Institute  Inc.,  Cary,  North  Carolina. 
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subtracting  the  error  covariance  matrix.  The  covariance  matrix  for  predictions  has  been  denoted 

as  Var(g)  by  White  and  Hodge  (1989). 

Example 

The  following  discussion  involves  the  analysis  of  a  simulated  data  set  in  order  to  further 
demonstrate  the  outputs  of  GAREML. 

Data 

The  data  (Table  5-1)  was  generated  using  a  six-parent  half-diallel  mating  design  and  a 
randomized  complete  block  field  design.  The  field  design  is  in  two  locations  with  four  complete 
blocks  per  location  and  two  trees  per  family  per  block.  The  underlying  genetic  parameters  for 
the  data  are  individual  tree  heritability  equals  0.25,  Type  B  correlation  equals  0.8,  dominance  to 
additive  variance  ratio  equals  0.25  and  the  population  mean  equals  15.0.  After  a  balanced  data 
set  was  generated,  the  observations  were  subjected  to  40%  random  deletion  (simulating  60% 
survival).  The  data  set  is  comprised  of  a  small  number  of  observations  and  while  not  an  optimal 
application  of  GAREML  serves  well  as  an  illustration. 

Analysis 

The  analysis  was  carried  out  with  two  different  linear  models  using  individual 
observations  as  the  data.  The  model  contained  eight  sources  of  variation  and  was  from  equation 
5-6  without  the  variable  set.  In  model  1,  test  environment  and  blocks  within  test  are  declared 
fixed.  The  subsequent  model  (model  2)  has  all  random  effects  except  the  mean.   Variance 
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Table  5-1.  Data  for  example  of  GAREML  operation.  L,  Bl,  F,  M,  T  and  RV  stand  for  location, 
block,  female,  tree  and  response  variable,  respectively.  A  proper  FORTRAN  read  format  would 
be(A2,T5,A2,T9,A2,T13,A2,T22,F10.5). 

LB1     F     M    T        RV 


1 

1 

1 

2 

1 

19.07165 

1 

1 

1 

3 

1 

13.17908 

1 

1 

1 

6 

1 

14.33610 

1 

1 

1 

6 

2 

12.48194 

1 

1 

2 

3 

1 

7.57821 

1 

1 

2 

3 

2 

12.73262 

1 

1 

2 

5 

1 

18.38451 

1 

1 

2 

5 

2 

9.84538 

1 

1 

2 

6 

1 

15.60306 

1 

1 

2 

6 

2 

17.44872 

1 

1 

3 

4 

1 

14.59613 

1 

1 

3 

5 

1 

16.95861 

1 

1 

3 

5 

2 

15.02863 

1 

1 

3 

6 

1 

15.95634 

1 

1 

4 

5 

1 

19.13362 

1 

1 

4 

5 

2 

12.08240 

1 

1 

4 

6 

1 

5.37647 

1 

1 

5 

6 

1 

18.87956 

1 

2 

1 

3 

2 

16.79470 

1 

2 

1 

5 

1 

15.81553 

1 

2 

1 

5 

2 

19.77063 

1 

2 

1 

6 

1 

17.49746 

1 

2 

1 

6 

2 

18.81207 

1 

2 

2 

3 

1 

15.03569 

1 

2 

2 

5 

1 

11.68149 

1 

2 

2 

6 

2 

12.78227 

1 

2 

3 

4 

1 

13.39599 

1 

2 

3 

5 

1 

13.54873 

1 

2 

3 

5 

2 

12.00935 

1 

2 

3 

6 

1 

16.89523 

1 

2 

3 

6 

2 

20.48223 

1 

2 

4 

5 

1 

15.21563 

1 

2 

4 

6 

1 

14.21138 

1 

2 

4 

6 

2 

15.65649 

1 

2 

5 

6 

1 

21.36959 

1 

2 

5 

6 

2 

16.39244 

1 

3 

1 

3 

1 

18.83196 

1 

3 

1 

3 

2 

20.45754 

1 

3 

1 

4 

1 

14.10900 

1 

3 

1 

4 

2 

16.49369 

1 

3 

1 

6 

2 

14.25154 

1 

3 

2 

3 

1 

19.57695 

1 

3 

2 

5 

2 

12.38303 
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Table  5-1 -continued 
LB1    F    M    T        RV 


1 

3 

2 

6 

2 

17.12110 

1 

3 

3 

4 

1 

13.03351 

1 

3 

3 

4 

2 

13.20463 

1 

3 

3 

5 

2 

12.44908 

1 

3 

4 

5 

1 

14.28528 

1 

3 

5 

6 

1 

17.57996 

1 

3 

5 

6 

2 

16.57026 

1 

4 

1 

3 

1 

16.91731 

1 

4 

1 

3 

2 

18.36209 

1 

4 

1 

4 

2 

16.70828 

1 

4 

1 

5 

2 

21.29535 

1 

4 

1 

6 

1 

15.23314 

1 

4 

2 

3 

1 

12.14596 

1 

4 

2 

3 

2 

12.20679 

1 

4 

2 

4 

1 

11.83520 

1 

4 

2 

6 

1 

14.27080 

1 

4 

3 

4 

1 

14.34923 

1 

4 

3 

4 

2 

16.39791 

1 

4 

3 

5 

1 

12.17513 

1 

4 

3 

5 

2 

14.95300 

1 

4 

4 

5 

2 

11.63311 

1 

4 

4 

6 

1 

13.29654 

1 

4 

4 

6 

2 

15.90303 

1 

4 

5 

6 

1 

17.22657 

1 

4 

5 

6 

2 

10.04577 

2 

1 

1 

2 

2 

9.80034 

2 

1 

1 

3 

1 

12.12891 

2 

1 

1 

3 

2 

18.00497 

2 

1 

1 

4 

1 

12.68041 

2 

1 

1 

4 

2 

13.14452 

2 

1 

1 

6 

1 

19.19915 

2 

1 

2 

3 

1 

5.36263 

2 

1 

2 

3 

2 

13.39351 

2 

1 

2 

5 

2 

11.13499 

2 

1 

2 

6 

1 

13.46429 

2 

1 

2 

6 

2 

16.87729 

2 

1 

3 

4 

2 

9.24115 

2 

1 

3 

6 

2 

13.49004 

2 

1 

4 

5 

1 

11.88620 

2 

1 

4 

5 

2 

9.83032 

2 

1 

4 

6 

1 

11.46474 

2 

1 

4 

6 

2 

12.68435 

2 

1 

5 

6 

1 

16.66260 

2 

1 

5 

6 

2 

14.14226 

2 

2 

1 

2 

1 

15.77378 
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Table  5-1 --continued 


LB1 

F 

M 

T 

RV 

2 

2 

1 

3 

1 

13.28328 

2 

2 

1 

4 

1 

11.22915 

2 

2 

1 

4 

2 

9.94041 

2 

2 

1 

5 

2 

14.03251 

2 

2 

1 

6 

2 

20.41990 

2 

2 

2 

3 

1 

10.74312 

2 

2 

2 

4 

1 

6.72215 

2 

2 

2 

5 

1 

12.77779 

2 

2 

2 

5 

2 

11.10388 

2 

2 

3 

4 

1 

12.52286 

2 

2 

3 

5 

1 

8.02745 

2 

2 

4 

5 

1 

14.14567 

2 

2 

4 

5 

2 

11.85937 

2 

2 

4 

6 

2 

14.61252 

2 

2 

5 

6 

1 

10.56892 

2 

2 

5 

6 

2 

14.13368 

2 

3 

1 

2 

1 

21.17819 

2 

3 

1 

3 

1 

13.56761 

2 

3 

1 

4 

1 

9.35457 

2 

3 

1 

5 

1 

13.78936 

2 

3 

1 

6 

1 

11.12412 

2 

3 

2 

3 

1 

9.41810 

2 

3 

2 

3 

2 

12.77555 

2 

3 

2 

4 

1 

15.38449 

2 

3 

2 

4 

2 

9.64170 

2 

3 

2 

5 

2 

11.64608 

2 

3 

2 

6 

1 

11.79241 

2 

3 

2 

6 

2 

9.14105 

2 

3 

3 

4 

1 

8.92909 

2 

3 

3 

6 

1 

8.08095 

2 

3 

4 

5 

1 

10.13996 

2 

3 

4 

5 

2 

10.30808 

2 

3 

4 

6 

1 

9.88286 

2 

3 

4 

6 

2 

8.80803 

2 

3 

5 

6 

1 

11.65281 

2 

3 

5 

6 

2 

7.90006 

2 

4 

1 

3 

1 

12.72744 

2 

4 

1 

3 

2 

14.44072 

2 

4 

1 

4 

1 

14.67983 

2 

4 

1 

5 

1 

9.27305 

2 

4 

1 

5 

2 

16.99880 

2 

4 

1 

6 

1 

14.17835 

2 

4 

2 

3 

1 

14.14628 

2 

4 

2 

3 

2 

10.64403 
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Table  5-1— continued 
LB1     F     M    T        RV 


2 

4 

2 

4 

1 

16.55552 

2 

4 

2 

5 

1 

10.30221 

2 

4 

2 

5 

2 

13.24760 

2 

4 

3 

4 

2 

8.44671 

2 

4 

3 

5 

1 

14.12292 

2 

4 

3 

5 

2 

14.17583 

2 

4 

3 

6 

1 

13.92882 

2 

4 

3 

6 

2 

16.18924 

2 

4 

4 

5 

1 

8.89750 

2 

4 

4 

5 

2 

9.79576 

2 

4 

4 

6 

1 

12.29319 

2 

4 

4 

6 

2 

9.16987 

2 

4 

5 

6 

1 

14.85018 

2 

4 

5 

6 

2 

16.69414 

components  are  estimated  with  model  1  receiving  two  different  treatments  of  negative  estimates, 
i.e.,  live  with  the  negative  estimates  (model  1A)  or  re-solve  the  system  setting  negative  estimates 
to  zero  (model  IB).  The  different  models  and  methods  for  dealing  with  negative  estimates  are 
demonstrated  so  that  the  reader  can  see  a  range  of  outputs  from  GAREML. 

Output 

Variance  component  estimates 

The  variance  component  estimates  are 

Model  1A 

SIGMA-SQUARED  GCA  1.221435 

SIGMA-SQUARED  SCA  0.233278 

SIGMA-SQUARED  LOCxGCA  -0.096850 

SIGMA-SQUARED  LOCxSCA  -0.548142 

SIGMA-SQUARED  BLOCKxFAM  1.242110 

SIGMA-SQUARED  ERROR  7.285051; 

Model  IB 

SIGMA-SQUARED  GCA  1.160636 

SIGMA-SQUARED  SCA  0.003190 

SIGMA-SQUARED  LOCxGCA  0.000000 
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SIGMA-SQUARED  LOCxSCA  0.000000 

SIGMA-SQUARED  BLOCKxFAM      0.753049 
SIGMA-SQUARED  ERROR  7.375388;  and 

Model  2 

SIGMA-SQUARED  LOCATION  3.430921 

SIGMA-SQUARED  BLOCK(LOC)  0.000000 

SIGMA-SQUARED  GCA  1.233609 

SIGMA-SQUARED  SCA  0.000000 

SIGMA-SQUARED  LOCxGCA  0.000000 

SIGMA-SQUARED  LOCxSCA  0.000000 

SIGMA-SQUARED  BLOCKxFAM  0.960168 

SIGMA-SQUARED  ERROR  7.197284. 

These  variance  component  estimates  illustrate  outputs  for  the  random  model,  the  mixed  model 

and  the  alternatives  for  dealing  with  negative  estimates. 

Fixed  effect  estimates 

Fixed  effect  estimates  are 

Model  IB 

MU  13.085052 

LOCATION  1  1.805455 

LOCATION  2  0.000000 

BLOCK(LOC)  1  -0.475396 

BLOCK(LOC)  2  0.856959 

BLOCK(LOC)  3  0.844716 

BLOCK(LOC)  4  0.000000 

BLOCK(LOC)  5  -0.219529 

BLOCK(LOC)  6  -0.526635 

BLOCK(LOC)  7  -1.682449 

BLOCK(LOC)  8  0.000000;  and 

Model  2 

MU  13.809567. 

The  interpretation  of  fixed  effect  estimates  for  model  IB  is  that  blocks  1  through  4  belong  with 

location  1  and  the  fourth  block  is  set  to  zero.   Blocks  5  through  8  are  those  of  location  2  and  the 

eighth  block  is  set  to  zero  as  well  as  location  2.    Sets  of  blocks  within  location  can  always  be 

determined  by  the  last  block  within  a  location  being  set  to  zero.  The  interpretation  of  set  to  zero 
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is  MU  is  the  mean  of  the  fourth  block  (labelled  block  8)  in  location  two;  and  any  estimable 
function  of  the  fixed  effects  can  be  generated  from  these  estimates.  An  example  of  an  estimable 
function  would  be  the  site  mean  of  location  1.  This  mean  would  be  estimated  as  MU  + 
LOCATION  1  +  l/4(BLOCK(LOC)  1  +  BLOCK(LOC)2  +  BLOCK(LOC)3  +  BLOCK(LOC) 
4).  MU  of  model  2  is  the  estimate  of  the  general  mean  across  sites  if  all  other  factors  are 
random.  All  of  these  estimates  are  the  result  of  generalized  least  squares  estimation. 
Asymptotic  covariance  matrix  for  the  variance  components 

The  asymptotic  covariance  matrix  for  the  variance  components  in  model  IB  would  appear 
as 


ASYMPTOTIC  VARIANCE  COVARIANCE  MATRIX 


GCA 

GCA 

0.7902569240 

GCA 

SCA 

-0.0490465017 

GCA 

LOCxGCA 

0.0000000000 

GCA 

LOCxSCA 

0.0000000000 

GCA 

BLOCKxFAM 

0.0003970615 

GCA 

ERROR 

-0.0001155675 

SCA 

SCA 

0.2047376344 

SCA 

LOCxGCA 

0.0000000000 

SCA 

LOCxSCA 

0.0000000000 

SCA 

BLOCKxFAM 

-0.1319741909 

SCA 

ERROR 

-0.0020057997 

LOCxGCA 

LOCxGCA 

0.0000000000 

LOCxGCA 

LOCxSCA 

0.0000000000 

LOCxGCA 

BLOCKxFAM 

0.0000000000 

LOCxGCA 

ERROR 

0.0000000000 

LOCxSCA 

LOCxSCA 

0.0000000000 

LOCxSCA 

BLOCKxFAM 

0.0000000000 

LOCxSCA 

ERROR 

0.0000000000 

BLOCKxFAM 

BLOCKxFAM 

1.6336304265 

BLOCKxFAM 

ERROR 

-1.2680804956 

ERROR 

ERROR 

2.0069152440 

This  matrix,  as  are  all  other  matrices  output,  is  half-stored.  The  output  is  read  as  "GCA  GCA" 
is  the  asymptotic  variance  of  the  gca  variance  component.   The  next  row  labelled  "GCA  SCA" 


101 
is  the  asymptotic  covariance  between  the  estimates  of  the  gca  variance  component  and  the  sea 

variance  component.     Thus  the  next  four  rows  are  asymptotic  covariances  of  gca  variance 

estimates  with  the  other  random  variables  in  the  model.  The  other  rows  are  read  in  a  like  manner 

and  if  the  analyst  wished  to  array  the  output  as  a  matrix,  all  necessary  components  are  at  hand. 

Predictions  of  random  variables 

All  predictions  of  random  variables  are  appropriately  labelled  according  to  the  character 

name  read  from  the  data  and  for  model  IB  would  appear  as 

(from  the  gca  output) 


GCA    1 

1.573253 

GCA  2 

-0.356262 

GCA   3 

-0.423469 

GCA  4 

-1.310747 

GCA   5 

-0.054977 

GCA   6 

0.572202; 

(from  the  sea  output) 


SCA    1 

2 

0.003806 

SCA    1 

3 

0.002662 

SCA    1 

4 

-0.002028 

SCA    1 

5 

0.001562 

SCA    1 

6 

-0.001678 

SCA   2 

3 

-0.003976 

SCA  2 

4 

0.001827 

SCA  2 

5 

-0.003550 

SCA  2 

6 

0.000914 

SCA   3 

4 

-0.000036 

SCA   3 

5 

-0.002495 

SCA   3 

6 

0.002681 

SCA  4 

5 

0.000656 

SCA  4 

6 

-0.004021 

SCA   5 

6 

0.003676. 

All  these  predictions  are  approximately  best  linear  unbiased  predictions  and  are  approximate 
because  the  variance  components  were  estimated  from  the  same  data. 
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Error  covariance  matrix  of  the  predictions 

The  error  covariance  matrix  of  the  predictions  is  output  as  a  half-stored  matrix  with  each 
row  appropriately  labelled.   This  matrix  for  model  IB  appears  as 
THE  ERROR  VARIANCE  COVARIANCE  MATRIX  FOR  GCA  ARRAYED  AS  A  VECTOR 


0.3618685934 

1 

0.1692300980 

2 

0.1465129987 

3 

0.1583039830 

4 

0.1713608386 

5 

0.1533590404 

6 

0.3687218966 

2 

2 

0.1382132356 

2 

3 

0.1730487382 

2 

4 

0.1543784409 

2 

5 

0.1570431430 

2 

6 

0.3545855963 

3 

3 

0.1622943256 

3 

4 

0.1744667783 

3 

5 

0.1845626177 

3 

6 

0.3518724881 

4 

4 

0.1567087948 

4 

5 

0.1584072224 

4 

6 

0.3466599143 

5 

5 

0.1570607852 

5 

6 

0.3502027434 

6 

6 

The  labelling  of  the  output  is  interpreted  identically  to  that  for  the  asymptotic  variance  covariance 
matrix  for  the  variance  components.  Those  rows  which  contain  a  parental  name  twice  are  the 
error  variance  for  that  parental  prediction  and  those  rows  containing  two  parental  names  are  the 
error  covariance  for  the  two  parental  predictions.  In  this  unbalanced  case  the  reader  will  see  that 
some  parents  have  more  error  associated  with  their  predictions  than  others,  i.e.,  compare  the 
error  for  parent  2  with  parent  5.  This  is  true  because  of  the  varying  number  of  observations 
associated  with  the  prediction  for  each  parent  and  also  the  varying  distribution  of  those 
observations  across  tests  and  blocks.   If  one  assume  that  the  estimate  for  gca  variance  from  the 
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data  equals  the  true  variance  for  gca,  then  the  correlation  of  the  prediction  with  the  true  value 

(Corr(g,g),  White  and  Hodge  1989)  for  parent  5  is  equal  toVl  -(.347/1.161)  or  0.84. 

Error  covariance  matrix  for  the  fixed  effects 

The  error  covariance  matrix  for  the  fixed  effects  is  output  as  a  half-stored  matrix.   The 

output  is  not  labelled;  however,  one  only  has  to  know  the  total  number  of  levels  for  all  fixed 

effects  to  assign  labels  if  needed.    The  primary  use  of  this  matrix  is  to  estimate  the  variance  of 

estimable  functions  of  the  fixed  effects.  If  1  denotes  the  vector  containing  the  specification  of  an 

estimable  function  and  Vb  denotes  the  error  covariance  matrix  for  fixed  effects,  then  the  variance 

of  an  estimable  function  is  equal  to  l'Vbl.   I'  for  the  mean  of  test  1  equals  [1  1  0  1/4  1/4  1/4  1/4 

0  0  0  0]. 

Conclusions 

GAREML  is  an  analytical  tool  for  use  with  models  common  to  forest  genetics.  The 
properties  of  the  variance  component  estimation  algorithm  have  been  documented  by  simulation 
studies  and  the  algorithm  presents  solutions  as  restricted  maximum  likelihood  estimates.  Many 
other  outputs  are  available  from  the  program  including  best  linear  unbiased  predictions, 
generalized  least  squares  estimates  of  fixed  effects,  error  covariance  matrices  of  predictions  and 
estimates,  and  the  asymptotic  covariance  matrix  for  variance  component  estimates. 

GAREML  is  not  intended  to  be  used  as  a  black  box.  The  program  has  many  potential 
uses:  variance  component  estimation,  parental  evaluation,  progeny  evaluation  and  simulated 
evaluation  of  mating  and  field  design.  However,  thoughtful  interpretation  of  the  outputs  is 
needed  in  order  to  realize  the  power  and  utility  of  the  program. 


CHAPTER  6 
CONCLUSIONS 


Optimal  mating  design  for  the  determination  of  genetic  architecture  was  explored. 
General  conclusions  were  reached  through  comparison  of  the  half-diallel,  half-sib  and  circular 
mating  designs.  In  particular,  the  comparison  of  the  half-diallel  and  circular  designs  is  pertinent 
to  the  establishment  of  future  progeny  tests  in  which  full-sib  families  are  desired.  Across  the 
experimental  levels  examined,  the  circular  mating  design  provides  more  efficient  estimates  of 
parameters  for  genetic  architecture  than  the  half-diallel  design.  If  an  estimate  of  the  variance  in 
general  combining  abilities  is  required,  the  half-sib  design  is  more  efficient  than  the  circular 
mating  design  over  most  of  the  experimental  levels  examined.  This  pattern  of  efficiency  argues 
for  complementary  mating  designs  involving  half-sib  designs  (open-pollinated  or  polycross)  to 
work  estimate  general  combining  ability  and  a  second  design  (full-sib  mating)  to  generate  crosses 
from  which  to  make  selections.  Complimentary  mating  designs  do  require  a  greater  monetary 
and  temporal  commitment.  If  this  type  of  commitment  is  not  justified  or  possible,  then  the 
circular  mating  design  should  be  used  to  generate  full-sib  families  and  estimate  genetic  parameters 
simultaneously. 

Considering  field  design  in  combination  with  mating  design,  full-sib  designs  reach 
maximum  efficiency  for  genetic  parameter  estimation  in  fewer  numbers  of  replicates  across 
locations  than  half-sib  designs.  For  any  specific  case  of  field  design  and  the  half-sib  mating 
design,  a  priori  knowledge  of  the  genetic  architecture  is  required  to  choose  the  optimal  field 
design  for  number  of  locations. 


104 


105 
In  cases  where  maximum  efficiency  of  an  experimental  design  is  obtained  and  the 

precision  of  genetic  parameter  estimates  is  still  less  than  desired,  the  optimal  use  of  experimental 

units  would  be  disconnected  sets  of  experiments  at  maximum  efficiency  with  the  parameter 

estimate  then  being  a  mean  of  the  estimates  from  the  disconnected  experiments.    Of  the  three 

mating  designs  only  the  half-diallel  exhibits  efficiency  optima  for  number  of  parents.     The 

optimum  for  number  of  parents  in  half-diallels  is  always  close  to  and  never  larger  than  six  parents 

with  the  fluctuation  resulting  from  the  genetic  architecture.   Thus  for  half-diallels  for  maximum 

efficiency  in  genetic  parameter  estimation,  the  number  of  parents  should  not  exceed  six  and 

desired  parameter  precision  obtained  by  using  disconnected  sets  of  six  parents.    Optima  for 

number  of  locations  exist  for  all  mating  designs  and  maximum  efficiency  would  again  be  obtained 

by  replicating  an  experiment  only  for  the  optimal  number  of  locations.   A  parameter  estimate  of 

the  desired  precision  would  be  calculated  as  a  mean  of  disconnected  experiments. 

Optimal  analysis  was  dealt  with  on  two  stages  (estimating  parental  worth  and  estimation 
of  variance  components  or  genetic  architecture).  The  estimation  of  parental  worth  was  examined 
for  the  half-diallel  mating  design.  It  is  argued,  on  theoretical  grounds  and  in  generality,  that  best 
linear  unbiased  prediction  and  best  linear  prediction  are  more  suited  to  the  problem  of  parental 
evaluation  than  ordinary  least  squares. 

Using  simulated  data  for  two  mating  designs  (half-diallel  and  half-sib)  variance  component 
estimation  techniques  were  compared  with  vary  levels  of  data  imbalance  and  two  levels  of  genetic 
control.  In  estimating  variance  components  (or  genetic  ratios  such  as  heritability)  four  criteria 
were  adopted  for  discrimination  among  estimation  techniques  (probability  of  nearness,  bias,  mean 
square  error  and  variance  of  estimation).  Of  the  four,  only  bias  and  variance  of  estimation 
proved  informative.  Bias  proved  useful  in  discriminating  among  treatments  of  negative  estimates 
with  accepting  and  living  with  the  negative  estimates  having  the  least  bias,  re-solving  the  system 
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with  negative  estimates  set  to  zero  intermediate  in  bias  and  setting  negative  estimates  to  zero 
producing  the  most  bias.  Variance  of  estimation  also  was  discriminatory  among  treatments  of 
negative  estimates  with  accepting  and  living  with  negative  estimates  having  the  highest  variance, 
setting  negative  estimates  to  zero  intermediate  in  variance  and  re-solving  the  system  setting 
negative  estimates  to  zero  having  the  lowest  variance. 

Variance  of  estimation  was  also  discriminatory  among  units  of  observation  and  variance 
component  estimation  techniques.  Of  the  two  units  of  observation  used  (individuals  and  plot 
means),  individual  observations  produced  estimates  with  better  properties  across  all  levels  of 
imbalance,  mating  designs  and  variance  component  estimation  techniques.  Of  the  variance 
component  estimation  techniques  contrasted,  restricted  maximum  likelihood  produced  estimates 
with  the  best  properties  (bias  and  variance  of  estimation)  across  all  mating  designs,  levels  of 
genetic  control  and  levels  of  imbalance.  Therefore  it  is  proposed  that  restricted  maximum 
likelihood  estimation  with  individual  observations  as  data  should  be  utilized. 

With  the  recommendation  to  use  restricted  maximum  likelihood,  the  program  used  to 
analyze  the  simulated  data  was  rewritten  into  a  user  friendly  format  able  to  analyze  both  full-sib 
and  half-sib  data.  Additional  outputs  (other  than  variance  components)  were  also  added  as 
options.  These  outputs  include  general  and  specific  combining  ability  predictions,  the  asymptotic 
covariance  matrix  for  variance  components,  generalized  least  squares  estimates  of  fixed  effects 
and  the  covariance  matrices  for  predictions  and  estimates. 


APPENDIX 
FORTRAN  SOURCE  CODE  FOR  GAREML 


C******THIS  PROGRAM  PRODUCES  REML  AND  MIVQUE  VARIANCE************* 

C****COMPONENT  ESTIMATES  BY  STARTING  ITERATION  FROM  THE*********** 

C****TRUE  VALUES  OF  THE  PARAMETERS  THROUGH  THE  USE  OF************* 
p***************Qjpcgi>p'r'iJT'c  a i  QQDiQTjnrvj*************************** 

C        PARAMETERS  DETERMINE  THE  PROGRAM  DIMENSIONS:  ANY  CHANGE  IN 
C        PARAMETER  SIZE  DECLARATION  SHOULD  BE  GLOBAL  SINCE  THEY  ARE 
C        ALSO  SPECIFIED  IN  THE  SUBROUTINES 

PROGRAM  MAIN 

PARAMETER  ( 
C     NOBSER  IS  THE  MAXIMUM  NUMBER  OF  OBSERVATIONS 

N  NOBSER  =  5000, 
C  NOBL  IS  THE  MAXIMUM  NUMBER  OF  BLOCKS  PER  LOCATION 

N  NOBL=36, 
C  NOCR  IS  THE  MAXIMUM  NUMBER  OF  FULL-SIB  CROSSES 

N  NOCR=75, 
C  NOBH  IS  THE  MAXIMUM  NUMBER  OF  FIXED  EFFECT  LEVELS  INCLUDING  THE 
C  MEAN 

N  NOBH  =  200, 
C  NVARBH  DIMENSIONS  THE  VARIANCE  COVARIANCE  MATRIX  FOR  FIXED 
C       EFFECTS 

N  NVARBH  =  (NOBH  *(NOBH-l))/2  +  NOBH, 
C  NOGCA  IS  THE  MAXIMUM  NUMBER  OF  PARENTS 

N  NOGCA =50, 
C  NOVARG  DIMENSIONS  THE  VARIANCE  COVARIANCE  MATRIX  FOR  GCA 

N  NOVARG  =  (NOGCA*(NOGCA-l))/2  +  NOGCA, 
C   NOX  IS  THE  MAXIMUM  NUMBER  OF  COLUMNS  FOR  FIXED  EFFECTS  PLUS 
C  RANDOM  EFFECTS 
C       PLUS  ONE  FOR  THE  DATA 

N  NOX  =1400, 
C   NOCBS  IS  THE  MAXIMUM  NUMBER  OF  LEVELS  FOR  THE  RANDOM  EFFECT 
C  HAVING  THE  GREATEST  NUMBER,  USUALLY  CROSS  BY  BLOCK  OR  PLOT 
C  COMBINATIONS 

N  NOCBS  =1000, 
C   NTOT  IS  THE  TOTAL  NUMBER  OF  COLUMNS  OF  NOX  PLUS  NOCBS 

N  NTOT = NOX  +  NOCBS, 
C  OTHER  PARAMETERS  USE  THE  PREVIOUS  DECLARATIONS  TO  ALLOCATED 
C  SUFFICIENT  SIZE  TO  SYMMETRIC  MATRICES  STORED  AS  VECTORS 

N  NIZED  =  NOX*NOCBS, 
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N  NIXPX  =  ((NOX*(NOX-l))/2)+NOX, 

NNSIP  =  NOX  +  NOCBS, 

N  NIZEP = ((NSIP*(NSIP- 1  ))/2)  +  NSIP) 

COMMON/CMN1/  NCOLT,NCOLTB,NCOLG,NCOLS,NCOLGT,NCOLST,NOBS, 
NNCOLB,NCOLX,NCOLCB,NCL(9),NORAN,NOFIX,NCLFIX, 
N  NCLRAN,NCOLSE,NRAN(9) 

COMMON/CMN2/ 

N  YQVQY(9),VQVQ(9,9),MEAN(NOBSER),SIG(9),GCA(NOGCA), 
N  BHAT(NOBH),SCA(NOCR) 

COMMON/CMN3/  DTERM(8,2),RANNAM(9),DUM2,FMVEC(NOCR), 
NPARENT(NOGCA),LOCO(10),REP(NOBL),DISSET(10) 

DIMENSION  TEST(NOBSER),BLOCK(NOBSER),F(NOBSER),M(NOBSER), 
N  FM(NOBSER),REML(9), VARHAT(9),SOL(9, 10),DUM(9),PRI(9), 
NSET(NOBSER),NAME(9),NUMMY(9),VARG(NOVARG),VARBH(NVARBH) 

INTEGER  NCOLT,NCOLB,NCOLG,NCOLS,NCOLGT,NCOLST,NCOLCB,NOBS, 
NNCOLTB,NCOLX,NCOLSE,NCL,NCLFIX,NCLRAN,NORAN,NOFIX, 
N  NUMMY,NRAN,NOITS,LEP 

DOUBLE  PRECISION  YQVQY,SIG,REML,ZAG,VARHAT,SOL,MEAN,DUM,VQVQ, 
NGCA,BHAT,PRI,SCA,SCALES,SCALEG,VARG,VARBH 

REAL  CVERG 

CHARACTER*1  DTERM,DUMDUM,DUM2,DUMB 

CHARACTER*80  FMAT 

CHARACTER*  16  FLNAME,FM,FMVEC,NT,KICK,LICK 

CHARACTER*11  NAME,RANNAM 

CHARACTER*13  SIGMA 

CHARACTER*8TEST,LOCO,F,M,PARENT,BLOCK,SET,DISSET,REP 

SIGMA  =  'SIGMA-SQU  ARED' 

NAME(1)  =  'LOCATION' 

NAME(2)  =  'BLOCK(LOQ' 

NAME(3)  =  'SET' 

NAME(4)  =  'GCA' 

NAME(5)  =  'SCA' 

NAME(6)  =  'LOCxGCA' 

NAME(7)  =  'LOCxSCA' 

NAME(8)  =  'BLOCKxFAM' 

NAME(9)  = 'ERROR' 

OPEN(UNIT=13,STATUS  =  'SCRATCH',FORM  =  'UNFORMATTED') 

DO  2031  1=1,8 
DO  2032  J  =1,2 
DTERM(I,J)  =  '  ' 
2032     CONTINUE 
2031  CONTINUE 

PRINT  *,  '     REML  VARIANCE  COMPONENTS  ESTIMATED  BY  THE  METHOD  OF 
NSCORING' 

PRINT  *,  '  THROUGH  THE  USE  OF  GIESBRECHTS  ALGORITHM' 

PRINT  *,  '  WRITTEN  BY  DUDLEY  HUBER  UNIVERSITY  OF  FLORIDA' 
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PRINT  *,  'WARNING  YOU  HAVE  JUST  ENTERED  THE  TWILIGHT  ZONE  OF 

NVARIANCE  COMPONENTS' 
PRINT  *,  'ANSWER  Y  FOR  YES  OR  N  FOR  NO  TO  THE  FOLLOWING  QUESTIONS' 
WRITE(6,20I2) 

vt*************'  /  r\ 

WRITE(6,2012) 
10101=0 
J  =  0 

2500  FORMAT('   PLEASE  TRY  AGAIN') 

PRINT*,'  FIRST  THE  FACTORS  TO  BE  READ  FROM  THE  DATA  WILL  BE  DETE 
NRMINED' 

2501  PRINT  *,  '  DOES  THE  DATA  HAVE  MULTIPLE  LOCATIONS? 
READ(6,1501)  DTERM(1,1) 
IF((DTERM(1,1).NE.'Y').AND.(DTERM(1,1).NE.'N'))  THEN 

WRITE(6,2500) 
GO  TO  2501 
ENDIF 

2502  PRINT  *,  '  ARE  THERE  MULTIPLE  BLOCKS(LOCATION)  IN  THE  DATA?      ' 
READ(6,1501)  DTERM(2,1) 
IF((DTERM(2,1).NE.'Y').AND.(DTERM(2,1).NE.'N'))  THEN 

WRITE(6,2500) 
GO  TO  2502 
ENDIF 

2503  PRINT  *,  '  ARE  THERE  DISCONNECTED  SETS  OF  GENETIC  ENTRIES  IN  THE 
NDATA?      ' 

READ(6,1501)  DTERM(3,1) 
IF((DTERM(3,1).NE.'Y').AND.(DTERM(3,1).NE.'N'))  THEN 

WRITE(6,2500) 

GO  TO  2503 
ENDIF 

WRITE(6,2012) 
7001  PRINT  *,'  IS  THE  ANALYSIS  BASED  ON  HALF-SIB  (H)  OR  FULL-SIB  FAMILI 
NES  (F)?  (H  OR  F)     ' 
READ(6,1501)DUM2 
IF((DUM2.NE.'H').AND.(DUM2.NE.'F'))  THEN 

WRITE(6,2500) 

GO  TO  7001 
ENDIF 

PRINT  *,  '  NOW  TO  DETERMINE  FIXED  OR  RANDOM  FACTORS  AND  PRIORS' 
PRINT  *,  '  ANSWER  F  FOR  FIXED  OR  R  FOR  RANDOM  TO  DETERMINE  STATUS' 
IF  (DTERM(1,1).EQ.'N')  GO  TO  1001 

2504  PRINT  *,  '  LOCATION  IS  FIXED  OR  RANDOM?      ' 
READ(6,1501)  DTERM(1,2) 
IF((DTERM(1,2).NE.'F').AND.(DTERM(1,2).NE.'R'))  THEN 

WRITE(6,2500) 
GO  TO  2504 
ENDIF 
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IF  (DTERM(1,2).EQ.'F')  THEN 

J=J  +  1 

GO  TO  1001 
ENDIF 

DTERM(1,2)  =  'R' 
PRINT  *,  '  WHAT  IS  THE  PRIOR  FOR  LOCATION? 

1=1+1 

RE AD(6, 1502)  PRIG) 
1502  FORMAT(F20.6) 

1001  IF  (DTERM(2,1).EQ.'N')  GO  TO  1002 

2505  PRINT  *,  '  BLOCK  IS  FIXED  OR  RANDOM?      ' 
READ(6,1501)  DTERM(2,2) 
IF((DTERM(2,2).NE.'F').AND.(DTERM(2,2).NE.'R'))  THEN 

WRITE(6,2500) 

GO  TO  2505 
ENDIF 
IF  (DTERM(2,2).EQ.'F')  THEN 

J=J+1 

GO  TO  1002 
ENDIF 

DTERM(2,2)  =  'R' 
PRINT  *,  '  WHAT  IS  THE  PRIOR  FOR  BLOCK? 

1  =  1+1 

READ(6,1502)PRI(I) 

1002  IF  (DTERM(3,1).EQ.'N')  GO  TO  1003 

2506  PRINT  *,  '  SETS  ARE  FIXED  OR  RANDOM? 
READ(6,1501)  DTERM(3,2) 
IF((DTERM(3,2).NE.'F').AND.(DTERM(3,2).NE.'R'))  THEN 

WRITE(6,2500) 

GO  TO  2506 
ENDIF 
IF  (DTERM(3,2).EQ.'F')  THEN 

J=J+1 

GO  TO  1003 
ENDIF 

DTERM(3,2)  =  'R' 
PRINT  *,  '  WHAT  IS  THE  PRIOR  FOR  SETS? 

1  =  1+1 
READ(6,1502)PRI(I) 

1003  PRINT  *,  '  ALL  OTHER  FACTORS  ARE  CONSIDERED  RANDOM' 

PRINT  *,  '  ANSWER  Y  FOR  YES  OR  N  FOR  NO  FOR  INCLUSION  OF  THE  FACTO 
NR  IN  THE  MODEL' 
WRITE(6,2012) 

2507  PRINT  *,  '  IS  GCA  IN  THE  MODEL?      ' 
READ(6,1501)  DTERM(4,1) 
IF((DTERM(4,1).NE.'Y').AND.(DTERM(4,1).NE.'N'))  THEN 

WRITE(6,2500) 
GO  TO  2507 
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ENDIF 

IF  (DTERM(4,1).EQ.'N')  GO  TO  1004 
C        PRINT  *,  '  GCA  IS  FIXED  OR  RANDOM? 
C        INPUT  *,  DTERM(4,2) 
C        IF  (DTERM(4,2).EQ.'F')  THEN 
C         J=J+1 
C  GO  TO  1004 

C        ENDIF 

DTERM(4,2)  =  'R' 

PRINT  *,  '  WHAT  IS  THE  PRIOR  FOR  GCA? 
1  =  1+1 

READ(6,1502)PRI(I) 

IF(DUM2.EQ.'H')  THEN 
DTERM(5,1)  =  'N' 
GO  TO  1005 

ENDIF 

1004  PRINT  *,  '  IS  SCA  IN  THE  MODEL? 
READ(6,1501)  DTERM(5,1) 
IF((DTERM(5,1).NE.'Y').AND.(DTERM(5,1).NE.'N'))  THEN 

WRITE(6,2500) 
GO  TO  1004 
ENDIF 

IF  ((DTERM(5,l).EQ.'N').OR.(DUM2.EQ.'H'))  GOTO  1005 
C        PRINT  *,  '  SCA  IS  FIXED  OR  RANDOM?      ' 
C        INPUT  *,  DTERM(5,2) 
C        IF  (DTERM(5,2).EQ.'F')  THEN 
C         J=J  +  1 
C         GO  TO  1005 
C        ENDIF 

DTERM(5,2)  =  'R' 

PRINT  *,  '  WHAT  IS  THE  PRIOR  FOR  SCA? 

1=1+1 

READ(6,1502)PRI(I) 

1005  IF(DTERM(1,1).EQ.'N')  GO  TO  1007 

PRINT  *,  '  IS  LOCATIONxGCA  INTERACTION  IN  THE  MODEL? 

READ(6,1501)  DTERM(6,1) 

IF((DTERM(6,1).NE.'Y').AND.(DTERM(6,1).NE.'N'))  THEN 

WRITE(6,2500) 

GO  TO  1005 

ENDIF 

IF  (DTERM(6,1).EQ.'N')  GO  TO  1006 
C        PRINT  *,  '  LOCATIONxGCA  IS  FIXED  OR  RANDOM?      ' 
C        INPUT  *,  DTERM(6,2) 
C        IF  (DTERM(6,2).EQ.'F')  THEN 
C         J=J+1 
C         GO  TO  1006 
C        ENDIF 

DTERM(6,2)  =  'R' 
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PRINT  *,  '  WHAT  IS  THE  PRIOR  FOR  LOCATIONxGCA? 
1  =  1+1 
READ(6,1502)PRI(I) 

1006  IF(DUM2.EQ.'H')  THEN 
DTERM(7,1)  =  'N' 

GO  TO  1007 
ENDIF 

PRINT  *,  '  IS  LOCATIONxSCA  IN  THE  MODEL? 
READ(6,1501)  DTERM(7,1) 

IF((DTERM(7,1).NE.'Y').AND.(DTERM(7,1).NE.'N'))  THEN 
WRITE(6,2500) 
GO  TO  1006 
ENDIF 

IF  ((DTERM(7,l).EQ.'N').OR.(DUM2.EQ.'H'))  GO  TO  1007 
C        PRINT  *,  '  LOCATIONxSCA  IS  FIXED  OR  RANDOM?      ' 
C        INPUT  *,  DTERM(7,2) 
C        IF  (DTERM(7,2).EQ.'F')  THEN 
C         J=J+1 
C  GO  TO  1007 

C        ENDIF 

DTERM(7,2)  =  'R' 

PRINT  *,  '  WHAT  IS  THE  PRIOR  FOR  LOCATIONxSCA? 
1  =  1+1 

RE AD(6, 1502)  PRIG) 

1007  PRINT  *,  '  IS  PLOT  OR  FAMILYxBLOCK  IN  THE  MODEL? 
READ(6,1501)  DTERM(8,1) 
IF((DTERM(8,1).NE.'Y').AND.(DTERM(8,1).NE.'N'))  THEN 

WRITE(6,2500) 

GO  TO  1007 

ENDIF 

IF  (DTERM(8,1).EQ.'N')  GO  TO  1008 
C        PRINT  *,  '  PLOT  OR  FAMILYxBLOCK  IS  FIXED  OR  RANDOM? 
C        INPUT  *,  DTERM(8,2) 
C        IF  (DTERM(8,2).EQ.'F')  THEN 
C         J=J+1 
C  GO  TO  1008 

C        ENDIF 

DTERM(8,2)  =  'R' 

PRINT  *,  '  WHAT  IS  THE  PRIOR  FOR  PLOT  OR  FAMILYxBLOCK? 

1  =  1+1 

RE AD(6, 1502)  PRIG) 

1008  PRINT  *,  '  WHAT  IS  THE  PRIOR  FOR  ERROR? 
1=1+1 

READ(6, 1502)  PRIG) 

J=J+1 

NOFIX=J 

NORAN=I 

WRITE(6,1009)  NOFIX,NORAN 
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1009  FORMATC  THE  NUMBER  OF  FIXED  FACTORS  PLUS  THE  MEAN  =  ',12,/, 
N'  THE  NUMBER  OF  RANDOM  FACTORS  PLUS  ERROR  =  ',12) 
PRINT  *,  '  DO  THESE  LEVELS  MATCH  YOUR  INTENDED  MODEL?  Y  OR  N    ' 
READ(6,1501)  DUMDUM 
IF  (DUMDUM.EQ.'N')  THEN 

PRINT  *,  '  RETURNING  TO  INITIALIZATION  OF  MODEL' 

PRINT  *,  '  TO  EXIT  PROGRAM  USE  CONTROL-BREAK' 

GO  TO  1010 
ENDIF 

PRINT  *,  '  THE  INPUT  DATA  SET  NAME  IS: 
READ(6,1503)FLNAME 
1503  FORMAT(A16) 
WRITE(6,1011) 

1011  FORMATC  THE  FORMAT  OF  THE  DATA  IS:  REMEMBERING  PARENTHESES',/) 
READ(6,1012)FMAT 

1012  FORMAT(A80) 

OPEN  (l,FILE  =  FLNAME,STATUS  =  'OLD') 
NOBS=l 

1  IF(DUM2.EQ.'H')  GO  TO  2 

IF((DTERM(1 , 1).EQ.  'N').  AND.(DTERM(2, 1).EQ.  'N').  AND.(DTERM(3, 1). 
NEQ.'N'))  GOTO  1013 

IF((DTERM(1,1).EQ.'N').AND.(DTERM(2,1).EQ.'N'))  GOTO  1014 

IF((DTERM(1,1).EQ.'N').AND.(DTERM(3,1).EQ.'N'))  GOTO  1015 

IF(DTERM(1,1).EQ.'N')  GOTO  1000 

IF((DTERM(2,1).EQ.'N').AND.(DTERM(3,1).EQ.'N'))  GOTO  1016 

IF(DTERM(2,1).EQ.'N')  GOTO  1017 

IF(DTERM(3,1).EQ.'N')  GO  TO  1018 

READ(l,FMT=FMAT,END  =  3)TEST(NOBS),BLOCK(NOBS),SET(NOBS), 
N  F(NOBS),M(NOBS),MEAN(NOBS) 

GO  TO  1019 
1018  READ  (1,FMT  =  FMAT,END  =  3)  TEST(NOBS),BLOCK(NOBS),F(NOBS),M(NOBS), 
N  MEAN(NOBS) 

GOTO  1019 
1000  READ  (1,FMT  =  FMAT,END  =  3)  BLOCK(NOBS),SET(NOBS),F(NOBS),M(NOBS), 
N  MEAN(NOBS) 

GO  TO  1019 

1013  READ  (1,FMT=FMAT,END  =  3)  F(NOBS),M(NOBS),MEAN(NOBS) 
GO  TO  1019 

1014  READ  (1,FMT=FMAT,END  =  3)  SET(NOBS),F(NOBS),M(NOBS),MEAN(NOBS) 
GOTO  1019 

1015  READ  (1,FMT=FMAT,END  =  3)  BLOCK(NOBS),F(NOBS),M(NOBS),MEAN(NOBS) 
GO  TO  1019 

1016  READ  (1,FMT  =  FMAT,END  =  3)  TEST(NOBS),F(NOBS),M(NOBS),MEAN(NOBS) 
GOTO  1019 

1017  READ  (1,FMT=FMAT,END  =  3)  TEST(NOBS),SET(NOBS),F(NOBS),M(NOBS), 
N  MEAN(NOBS) 

GO  TO  1019 

2  IF((DTERM(1,1).EQ.'N').AND.(DTERM(2,1).EQ.'N').AND.(DTERM(3,1). 
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NEQ.'N'))  GO  TO  7013 

IF((DTERM(1,1).EQ.'N').AND.(DTERM(2,1).EQ.'N'))  GO  TO  7014 

IF((DTERM(1,1).EQ.'N').AND.(DTERM(3,1).EQ.'N'))  GOTO  7015 

IF((DTERM(2,1).EQ.'N').AND.(DTERM(3,1).EQ.'N'))  GO  TO  7016 

IF(DTERM(2,1).EQ.'N')  GO  TO  7017 

IF(DTERM(3,1).EQ.'N')  GO  TO  7018 

READ(l,FMT=FMAT,END  =  3)TEST(NOBS),BLOCK(NOBS),SET(NOBS), 
N  F(NOBS),MEAN(NOBS) 

GO  TO  1019 

7018  READ  (1,FMT=FMAT,END  =  3)  TEST(NOBS),BLOCK(NOBS),F(NOBS), 
N  MEAN(NOBS) 

GO  TO  1019 

7013  READ  (1,FMT=FMAT,END  =  3)  F(NOBS),MEAN(NOBS) 
GOTO  1019 

7014  READ  (1,FMT=FMAT,END  =  3)  SET(NOBS),F(NOBS),MEAN(NOBS) 
GO  TO  1019 

7015  READ  (1,FMT  =  FMAT,END  =  3)  BLOCK(NOBS),F(NOBS),MEAN(NOBS) 
GOTO  1019 

7016  READ  (1,FMT=FMAT,END  =  3)  TEST(NOBS),F(NOBS),MEAN(NOBS) 
GOTO  1019 

7017  READ  (1,FMT=FMAT,END  =  3)  TEST(NOBS),SET(NOBS),F(NOBS), 
N  MEAN(NOBS) 

1019  NOBS  =  NOBS +1 
GOTO  1 

3  NOBS  =  NOBS-l 
CLOSE(l) 
WRITE(6,2015)  NOBS 

2015  FORMATC  THE  NUMBER  OF  OBSERVATIONS  IS    ',14) 
IF(DUM2.EQ.'H')  GO  TO  7019 
DO  4  1=1, NOBS 
FM(I)  =  F(I)//M(I) 

4  CONTINUE 

7019  K  =  0 

DO  5010  1  =  1,8 
IF(DTERM(I,1).EQ.'N')  GO  TO  5010 
IF(DTERM(I,2).EQ.'R')  THEN 
K  =  K+1 

RANNAM(K)  =  NAME(I) 
ENDIF 
5010  CONTINUE 

RANNAM(K+  1)  =  NAME(9) 
D0  72I=l,NOCR 

FMVECa)  =  ' 
72     CONTINUE 
J  =  0 

DO  162  1  =  1,9 
IF(PRI(I).GT.0.0)  THEN 
J=J+1 
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SIG(J)  =  PRI(I) 

ENDIF 
162   CONTINUE 

NCOLT=0 

NCOLB=0 

NCOLSE  =  0 

NCOLTB=0 

NCOLG=0 

NCOLS=0 

NCOLGT=0 

NCOLST=0 

NCOLCB=0 

IF(DTERM(1,1).EQ.'N')  GOTO  1020 

CALL  NOCOL(TEST,NOBS,LOCO,NCOLT) 
1020NCL(1)  =  NCOLT 

IF(DTERM(2,1).EQ.'N')  GOTO  1021 

CALL  NOCOL(BLOCK,NOBS,REP,NCOLB) 

1021  IF(DTERM(3,1).EQ.'N')  GOTO  1022 
CALLNOCOL(SET,NOBS,DISSET,NCOLSE) 

1022  NCL(3)  =  NCOLSE 

IF((DTERM(1,1).EQ.'N').AND.(DTERM(2,1).EQ.'Y'))  THEN 

NCOLTB  =  NCOLB 

GO  TO  1023 
ENDIF 
NCOLTB  =  NCOLT*NCOLB 

1023  IF(DUM2.EQ.'H')  THEN 

CALL  NOCOL(F,NOBS,PARENT,NCOLG) 
GO  TO  7022 
ENDIF 

CALL  NOPAR(F,M,NOBS,PARENT,NCOLG) 
7022  NCL(2)  =  NCOLTB 
NCL(4)  =  NCOLG 

IF((DUM2.EQ.'H').OR.(DTERM(5,l).EQ.'N'))  GO  TO  7021 
DO  32  1=1, NOBS 
IF(LEQ.1)THEN 
FMVEC(I)  =  FM(I) 
NCOLS=l 
GO  TO  32 
ENDIF 

DO  33  J  =  l,NCOLS 
KICK=FM(I) 
LICK=FMVEC(J) 
IF(KICK.EQ.LICK)  GO  TO  32 
33      CONTINUE 

NCOLS  =  NCOLS  +  1 
FMVEC(NCOLS)  =  FM(I) 
32     CONTINUE 

DO  159K=l,NCOLS-l 
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N  =  K+1 
DO  159J  =  N,NCOLS 

IF(FMVEC(K).LT.FMVEC(J))  GO  TO  159 

NT=FMVEC(K) 

FMVEC(K)  =  FMVEC(J) 

FMVEC(J)  =  NT 
159   CONTINUE 

7021  IF(DUM2.EQ.'H')  NCOLS  =  0 
NCL(5)  =  NCOLS 
NCOLST=NCOLS*NCOLT 
NCOLGT=NCOLG*NCOLT 
NCOLCB  =  NCOLS  *NCOLTB 
IF(DUM2.EQ.'H')  NCOLCB  =  NCOLG*NCOLTB 
IF(DTERM(6,1).EQ.'N')  NCOLGT=0 
IF(DTERM(7,1).EQ.'N')  NCOLST  =  0 
IF(DTERM(8,1).EQ.'N')  NCOLCB  =  0 
NCL(6)  =  NCOLGT 
NCL(7)  =  NCOLST 
NCL(8)  =  NCOLCB 
WRITE(6,5005)  NCOLG 

5005  FORMATC  NUMBER  OF  PARENTS  IS  ',14) 
WRITE(6,5006)  NCOLS 

5006  FORMATC  NUMBER  OF  FULL-SIB  CROSSES  IS  ',14) 
NCLFIX  =  1 

NCLRAN  =  0 
DO  1024  1=1,8 

IF(DTERM(I,2).EQ.'F')  THEN 
NCLFIX  =  NCLFIX  +  NCLfl) 
GO  TO  1024 
ENDIF 

NCLRAN  =  NCLRAN  +  NCL(I) 
1024  CONTINUE 

WRITE(6,6001)  NCLFIX,NCLRAN 
6001  FORMATC  FIXED  EFFECT  COLUMNS  =  ',18, 
N'   RANDOM  EFFECT  COLUMNS  =    ',18) 
CVERG  =  .0001 

PRINT  *,'  THE  CONVERGENCE  CRITERION   FOR   VARIANCE  COMPONENTS 
WHICH 

NEQUALS' 

PRINT  *,'  THE  SUM  OF  THE  ABSOLUTE  DEVIATIONS  IS  SET  TO  .0001.' 
PRINT  *,'  IF  YOU  WISH  TO  CHANGE  TYPE  Y  IF  NOT  TYPE  N.      ' 
READ(6,1501)  DUMDUM 
1501  FORMAT(Al) 

IF(DUMDUM.EQ.'N')  GO  TO  9021 
PRINT*,'  THE  CONVERGENCE  CRITERION  IS:     ' 
READ(6,1502)CVERG 
9021  NCOLX  =  NCLFIX  +  NCLRAN +1 
NOITS  =  30 
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PRINT*,'  THE  NUMBER  OF  ITERATIONS  ALLOWED  IS  SET  TO  30' 

PRINT*,'  DO  YOU  WISH  TO  CHANGE  THIS?   (Y  OR  N)     ' 

READ(6,1501)  DUMDUM 

IF(DUMDUM.EQ.'Y')  THEN 

PRINT*,'  THE  NUMBER  OF  ITERATIONS  DESIRED  IS:    ' 

READ*,  NOITS 

ENDIF 

PRINT  *,  '  IF  THE  SOLUTION  AFTER  ITERATING  TO  CONVERGENCE  CONTAINS 

N  ONE  OR  MORE' 

PRINT  *,  '  NEGATIVE  VARIANCE  COMPONENT  ESTIMATES ! ! ! ! ' 

PRINT  *,  '  DO  YOU  WISH  TO  RE-SOLVE  THE  SYSTEM  SETTING  NEGATIVE  EST 

NIMATES  TO  ZERO?' 
PRINT  *,  '  TYPE  Y  OR  N       ' 
READ(6, 1501)  DUMB 

CALL  XPRIMX(TEST,BLOCK,SET,F,M,FM) 

REWIND(13) 

DO  801  I=l,NORAN 
NUMMY(I)=0 
801    CONTINUE 
803   DO  50  L=l, NOITS 

DO  71  I=l,NORAN 

DUM(I)  =  SIG(I) 
71     CONTINUE 

WRITE(6,5001)  L 
5001  FORMAT('  THIS  IS  ITERATION  NUMBER  ',13) 

DO  8001  I=l,NORAN 

WRITE(6,154)  SIGMA,RANNAM(I),SIG(I) 
8001    CONTINUE 

DO  21  I=l,NORAN 

DO  22  J  =  l,NORAN 

VQVQ(I,J)=0.0 
22       CONTINUE 

YQVQY(I)=0.0 
21      CONTINUE 

DO  51  I=l,NORAN 

IF(SIG(I).LT.0.0)  SIG(I)  =  0.0 
51       CONTINUE 
CALL  DESIGN 
REWIND(13) 
D0  5I=l,NORAN 

SOL(I,NORAN+  1)  =  YQVQY(I) 

REML(I)=0.0 

IF(NUMMY(I).EQ.l)  YQVQY(I)  =  0.0 

D0  6J  =  l,NORAN 

SOL(I,J)  =  VQVQ(I,J) 

IF(NUMMYa).EQ.l)  SOL(I,J)=0.0 
6       CONTINUE 
5      CONTINUE 
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CALL  L2SWP(S0L,N0RAN,N0RAN+ 1,1, NORAN) 
D0  7I=l,NORAN 
REML(I)  =  SOL(I,NORAN  +  1) 

7  CONTINUE 
ZAG  =  0.0 

DO  8  1=1, NORAN 
ZAG = ZAG  +  DABS(REML(I)-DUM(I)) 

8  CONTINUE 

DO  9  1=1, NORAN 

SIG(I)  =  REML(I) 

9  CONTINUE 
IF(ZAG.LT.CVERG)  GO  TO  11 

50    CONTINUE 

1 1      IF(DUMB.EQ.'N')  GO  TO  8025 
IF(DUMB.EQ.'Y')  THEN 
LEP=0 

DO  851  1=1, NORAN 
IF(SIGa).LT.O.O)  LEP=1 
IF(SIG(I).LE.0.0)  THEN 
SIGa)  =  0.0 
NUMMY(I)=1 
ENDIF 
851      CONTINUE 
ENDIF 

IF(LEP.EQ.l)GOTO803 
8025  DO  10I=1,NORAN 

VARHAT(I)  =  SIG(I) 

10  CONTINUE 

PRINT  *,  '  WHAT  IS  THE  FILENAME  FOR  THE  VARIANCE  COMPONENT  OUTPUT 
N?   ' 

READ(6,1503)FLNAME 

OPEN  (2,FILE  =  FLNAME,STATUS  = 'UNKNOWN') 
DO  155  J  =  1, NORAN 
WRITE(2,FMT=  154)  SIGMA,RANNAM(J),VARHAT(J) 

155  CONTINUE 

154  FORMAT(1X,A13,A12,F20.6) 
CLOSE(2) 
DO  156  1=1,9 

IF(RANNAM(I).EQ.'GCA')  SCALEG  =  VARHAT(I) 

IF(RANNAM(I).EQ.'SCA')  SCALES  =  VARHAT(I) 

156  CONTINUE 

PRINT  *,  '  DO  YOU  DESIRE  GCA  PREDICTIONS?  (YORN)     ' 
READ(6,1501)  DUMDUM 
IF(DUMDUM.EQ.'N')  GO  TO  704 

PRINT  *,  '  DO  YOU  HAVE  A  PRIOR  ESTIMATE  OF  GCA  VARIANCE  TO  USE  INS 
NTEAD' 

PRINT  *,  '  OF  THE  DATA  ESTIMATE?  (YORN)   ' 
READ(6, 1501)  DUMDUM 
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IF(DUMDUM.EQ.'Y')  THEN 

PRINT  *,  '  WHAT  IS  THE  GCA  VARIANCE  ESTIMATE  YOU  WISH  TO  USE?  ' 

READ(6,1502)SCALEG 

ENDIF 

DO  157I=l,NCOLG 
GCA(I)  =  SCALEG*GC  A(I) 
157   CONTINUE 

PRINT  *,  '  WHAT  IS  THE  FILENAME  FOR  THE  GCA  PREDICTION  OUTPUT?  ' 

READ(6,1503)  FLNAME 

OPEN(4,FILE=FLNAME,STATUS  = 'UNKNOWN') 

DO  178I=l,NCOLG 

WRITE(4,FMT  =  703)  PARENT(I),GCA(I) 
178   CONTINUE 

703  FORMATf  GCA',1X,A8,F20.6) 
CLOSE(4) 

704  IF(DUM2.EQ.'H')  GO  TO  705 
IF(DTERM(5,1).EQ.'N')  GO  TO  705 

PRINT  *,  '  DO  YOU  DESIRE  SCA  PREDICTIONS?  (Y  OR  N)     ' 
READ(6,1501)  DUMDUM 
IF(DUMDUM.EQ.'N')  GO  TO  705 

PRINT  *,  '  DO  YOU  HAVE  A  PRIOR  ESTIMATE  OF  SCA  VARIANCE  TO  USE  INS 
NTEAD' 

PRINT  *,  '  OF  THE  DATA  ESTIMATE?  (Y  OR  N)   ' 
READ(6,1501)  DUMDUM 
IF(DUMDUM.EQ.'Y')  THEN 

PRINT  *,  '  WHAT  IS  THE  SCA  VARIANCE  ESTIMATE  YOU  WISH  TO  USE?   ' 
READ(6, 1502)  SCALES 
ENDIF 
DO  169I=l,NCOLS 

SCA(I)  =  SCALES  *SCA(I) 
169  CONTINUE 

PRINT  *,  '  WHAT  IS  THE  FILENAME  FOR  THE  SCA  PREDICTION  OUTPUT?  ' 

READ(6, 1503)  FLNAME 

OPEN(8,FILE  =  FLNAME,STATUS  = 'UNKNOWN') 

DO  171  I=l,NCOLS 

WRITE(8,FMT=707)  FMVEC(I),SCA(I) 
171    CONTINUE 

707  FORMATf  SCA',1X,A16,F20.6) 
CLOSE(8) 

705  PRINT  *,  '  DO  YOU  DESIRE  FIXED  EFFECT  ESTIMATES?  (Y  OR  N)     ' 
READ(6,1501)  DUMDUM 

IF(DUMDUM.EQ.'N')  GO  TO  706 

PRINT  *,  '  WHAT  IS  THE  FILENAME  FOR  FIXED  EFFECTS  ESTIMATES?  ' 

READ(6,1503)  FLNAME 

OPEN(9,FILE  =  FLNAME,STATUS  = 'UNKNOWN') 

WRITE(9,FMT=708)  BHAT(l) 

708  FORMATC         MU',T15,F20.6) 
J=l 
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DO  172  1=1,3 
IF(DTERM(I,2).EQ.'F')  THEN 
DO  173K=1,NCL(I) 
J=J+1 

IFa.EQ.l)THEN 

WRITE(9,FMT=711)  LOCO(K),BHAT(J) 
ENDIF 

IF(I.EQ.3)  THEN 

WRITE(9,FMT=711)  DISSET(K),BHAT(J) 
ENDIF 

WRITE(9,FMT=709)  NAME(I),K,BHAT(J) 
173       CONTINUE 

ENDIF 
172   CONTINUE 
711    FORMAT(A8,T15,F20.6) 
709  FORMAT(A11,I3,F20.6) 
CLOSE(9) 

D0  726I=l,NOVARG 
VARG(I)  =  0.D0 

726  CONTINUE 

D0  727I=1,NVARBH 
VARBH(I)=0.D0 

727  CONTINUE 

706   PRINT  •/        DO  YOU  DESIRE  THE  ASYMPTOTIC  VARIANCE  COVARIANCE' 

PRINT  *,'         MATRIX  FOR  VARIANCE  COMPONENTS?  (Y  OR  N)    ' 

READ(6,1501)  DUMDUM 

IF(DUMDUM.EQ.'N')  GO  TO  751 

PRINT  *,'  WHAT  IS  THE  FILENAME  FOR  VAR(VC)?     ' 

READ(6,1503)FLNAME 

OPEN(12,FILE  =  FLNAME,STATUS  =  'UNKNOWN') 

WRITE(12,755) 
755   FORMATC  ASYMPTOTIC  VARIANCE  COVARIANCE  MATRIX',/) 

D0  752I=l,NORAN 

DO  753  J  =  I,NORAN 
SOL(I,J)  =  SOL(I,J)*2.0 
WRITE(12,754)RANNAMa),RANNAM(J),SOLa,J) 

753  CONTINUE 
752   CONTINUE 

754  FORMAT(A  1 1  ,T15, Al  1  ,T30,F20. 10) 

751    PRINT*,'DO  YOU  DESIRE  THE  ERROR  VARIANCE  COVARIANCE  MATRIX  FOR 
NGCA?  (Y  OR  N)    ' 
READ(6,1501)  DUMDUM 
IF(DUMDUM.EQ.'N')  GO  TO  715 

PRINT  *,'  WHAT  IS  THE  FILENAME  FOR  EVAR(GHAT)?      ' 
READ(6,1503)  FLNAME 

OPEN(10,FILE  =  FLNAME,STATUS  = 'UNKNOWN') 
CALL  VARX(VARG,VARBH) 
WRITE(10,721) 
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K=0 

D0  716I=l,NCOLG 
D0  717J  =  I,NCOLG 

K=K+1 

WRITE(10,718)  VARG(K),PARENT(I),PARENT(J) 

717  CONTINUE 
716   CONTINUE 

721  FORMAT(THE  ERROR  VARIANCE  COVARIANCE  MATRIX  FOR  GCA  ARRAYED 
NAS  A  VECTOR',/) 

718  FORMAT(F20.10,T25,A8,T35,A8) 
CLOSE(IO) 

715  PRINT  *,  '  DO  YOU  DESIRE  THE  VARIANCE  COVARIANCE  MATRIX  FOR  FIXED 
N  EFFECTS?  (Y  OR  N)     ' 

READ(6, 1501)  DUMB 

IF(DUMB.EQ.'N')  GO  TO  719 

IF(DUMDUM.EQ.'N')  CALL  VARX(VARG,VARBH) 

PRINT  *,  '  WHAT  IS  THE  FILENAME  FOR  VAR(BETAHAT)?    ' 

READ(6,1503)  FLNAME 

OPEN(ll,FILE  =  FLNAME,STATUS  = 'UNKNOWN') 

K=0 

D0  723I=1,NCLFIX 

D0  724J  =  I,NCLFIX 
K=K+1 

WRITE(1 1,722)  VARBH(K) 
724    CONTINUE 
723   CONTINUE 

722  FORMAT(F20. 10) 
CLOSE(ll) 

719  STOP 
END 

C  SUBROUTINE  L2SWP  SWPS  THE  DESIGNATED  COLUMNS  OF  A  MATRIX  X  AND 
C  RETURNS  THE  SWEPT  MATRIX  AS  X 

SUBROUTINE  L2SWP(X,NROWX,NCOLXX,NSTA,NEND) 

INTEGER  NROWX ,  NCOLXX ,  NST A ,  NEND ,  NTOT 

DOUBLE  PRECISION  X(9,10),  DMIN,  D,  B,  BB(10) 
C  NSWP  DEFINES  THE  PIVOT  COLUMNS  FOR  SWP 

DMIN=lE-8 
C  IF  LESS  THAN  FULL  RANK  MATRICES  ARE  ENCOUNTERED,  DMIN  MUST  BE 
C       EMPLOYED 

C  TO  ZERO  THE  ROW  AND  COLUMN  ASSOCIATED  WITH  THE  DEPENDENCY  TO 
C   PRODUCE  A  GENERALIZED  INVERSE 

DO  10K=NSTA,NEND 

D  =  X(K,K) 

IF  (D.LE.DMIN)  THEN 
DO  21  1=1, NROWX 
DO  22  J  =1, NCOLXX 
X(LK)  =  0.0 
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X(K,J)  =  0.0 
22       CONTINUE 
21       CONTINUE 
GO  TO  10 
ENDIF 

DO20J  =  l,NCOLXX 
X(K,J)  =  X(K,J)/D 
20    CONTINUE 

DO30I  =  K+l,NROWX 
C  I  SHOULD  BE  INCREMENTED  SO  THAT  I  IS  NOT  EQUAL  TO  K 
B  =  X(I,K) 
DO40L=l,NCOLXX 
X(I,L)  =  X(I,L)-B*X(K,L) 
40    CONTINUE 
X(I,K)  =  -B/D 
30    CONTINUE 
X(K,K)=1/D 
C  BACKWARD  ELIMINATION 
NTOT  =  NSTA  +  NEND 
IF(NTOT.EQ.2)  GO  TO  61 
C  SAVING  ABOVE  DIAGONAL  ENTRIES  FOR  MULTIPLICATION  WEIGHTS 
KK=1 

DO  12  J=1,K-1 
BB(KK)  =  X(J,K) 
KK=KK+1 

12  CONTINUE 

C  ZEROING  ABOVE  DIAGONAL  ENTRIES  FOR  INSERTION  OF  INVERSE  VALUES 
DO  13I=1,K-1 
X(I,K)  =  0.0 

13  CONTINUE 

C  DOING  ROW  OPERATIONS  TO  CREATE  ABOVE  DIAGONAL  ENTRIES  FOR  INVERSE 
N=l 

DO70M=l,K-l 
B  =  BB(N) 
N  =  N+1 

DO80J=l,NCOLXX 
X(M,J)  =  X(M,J)-B*X(K,J) 
80  CONTINUE 
70  CONTINUE 
10  CONTINUE 
61  RETURN 
END 

C     DESIGN  CREATES  DESIGN  MATRICES  FOR  MAIN  EFFECTS  AND  INTERACTIONS 
C  AND  FORMS  THE  NORMAL  EQUATIONS 

SUBROUTINE  DESIGN 
PARAMETER  ( 
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N  NOBSER=5000, 

N  NOBL  =  36, 

N  NOCR=75, 

N  NOBH=200, 

N  NOGCA  =  50, 

NNOX=1400, 

NNOCBS=1000, 

N  NTOT=NOX  +  NOCBS, 

N  NIZED  =  NOX*NOCBS, 

N  NIXPX  =  ((N0X*(N0X-l))/2)  +  N0X, 

N  NSIP  =  NOX  +  NOCBS, 

N  NIZEP = ((NSIP*(NSIP- 1  ))/2)  +  NSIP) 

C0MM0N/CMN1/  NCOLT,NCOLTB,NCOLG,NCOLS,NCOLGT,NCOLST,NOBS, 
NNCOLB,NCOLX,NCOLCB,NCL(9),NORAN,NOFIX,NCLFIX, 
N  NCLRAN,NCOLSE,NRAN(9) 

COMMON/CMN2/ 

N  YQVQY(9),VQVQ(9,9),MEAN(NOBSER),SIG(9),GCA(NOGCA), 
N  BHAT(NOBH),SCA(NOCR) 

COMMON/CMN3/  DTERM(8,2),RANNAM(9),DUM2,FMVEC(NOCR), 
NPARENT(NOGCA),LOCO(10),REP(NOBL),DISSET(10) 

DIMENSION  TK(:),D(:),P(:),TRACER(9) 

ALLOCATABLE  ::  TK,D,P 

INTEGER  NCOLT,NCOLTB,NCOLG,NCOLS,NOBS,NCOLGT,NCOLST,NVEC, 
NNCOLCB,NCOLX,NSTA,NEND,NCOLRD,NSTAK,NENDK,NWNUMl,    NWNUM2, 
NINUM, 

NNWNUM3,NCL,NBIG,NORAN,NOFIX,NRAN,NCLFIX,NCLRAN,NDFIX,NCOLSE, 
NNODUM 

DOUBLE  PRECISION  TK,MEAN,D,TR,TRACER,YQVQY,VQVQ,SIG,GCA, 
N  BHAT,P,SUB,SCA 

CHARACTER*  1  DTERM,DUM2 

CHARACTER*8  PARENT,LOCO,REP,DISSET 

CHARACTER*  16  FMVEC 

CHARACTER*11  RANNAM 

NBIG  =  NRAN(NORAN) 

DO  1012I=1,NORAN-1 

IF(NRANa).GT.NBIG)  NBIG  =  NRAN(I) 
1012  CONTINUE 

NWNUMl=(NCOLX*(NCOLX-l))/2  +  NCOLX 

NWNUM2  =  NCOLX*NBIG 

NWNUM3  =  ((NCOLX  +  NBIG)*(NCOLX  +  NBIG-1  ))/2  +  NCOLX  +  NBIG 

ALLOCATE  (TK(NWNUM  1  ),D(NWNUM2),P(NWNUM3)) 

READ(13)  TK 

DO  10I=1,NWNUM1 

TK(I)=TK(I)/SIG(NORAN) 

10  CONTINUE 

DO  11  I=1,NWNUM2 
D(I)  =  0.0 

1 1  CONTINUE 
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DO  12I=1,NWNUM3 
P(I)=0.0 

12     CONTINUE 

/-.**************************************************************** 

C      FORMING  THE  MATRIX  TO  BE  SWP  TO  PRODUCE  YQVQY  AND  VQVQ 

/-.**************************************************************** 
/-.****************************************************************** 

C  TK  =  X'*INV(VK)*X     COMPLETED 

/->******** ******* *************************************************** 

NSTAK=2  +  NCLFIX  +  NCLRAN 
DO  1300  INUM  =  l,NORAN-l 

NODUM  =  NORAN-INUM 

NCOLRD  =  NCOLX  +  NRAN(NODUM) 

NSTAK  =  NSTAK-NRAN(NODUM) 

NENDK = NSTAK  +  NRAN(NODU  M)- 1 
DO  251  I  =  NSTAK,NENDK 
M  =  NVEC(I,NCOLX) 
II  =  I-NSTAK+1 
N  =  NVEC(II,NCOLRD) 
NN  =  N 
DO  252  J  =  I,NENDK 

M  =  M+1 

NN=NN+1 

P(NN)  =  TK(M)*SIG(NODUM) 
252     CONTINUE 

P(N+1)  =  P(N+1)  +  1.0 

251   CONTINUE 

p*********************************************************** 

C   R  =  I  +  SIG(I)*(Zi'*INV(VK)*Zi)  HAS  BEEN  FORMED 

p*********************************************************** 

K  =  0 
DO  254  J  =  NSTAK,NENDK 

DO  255  1=1, NCOLX 

K  =  K+1 

IF(J.LT.I)  THEN 

D(K)=0.0 

GO  TO  255 

ENDIF 
M  =  NVEC(I,NCOLX) 

M  =  M+J-I+1 

D(K)  =  TK(M)*SQRT(SIG(NODUM)) 
255    CONTINUE 
254   CONTINUE 

DO  222  I  =  NSTAK,NENDK 
N  =  NVEC(I,NCOLX) 
II  =  I-NSTAK+1 
NN  =  NCOLX*ai-l) 
DO  223  J  =  I,NCOLX 
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M  =  N+J-I+1 
K  =  NN  +J 

D(K)=TK(M)*SQRT(SIG(NODUM)) 
223    CONTINUE 

222   CONTINUE 

/->*************  ************************************************ 

C   D=Zi'*INV(VK)*X*SQRT(SIG(I))  HAS  BEEN  FORMED 

f<************************************************************* 

/->************************************************************* 

C  TD  =  D' 

/~<************************************************************* 

K=0 

NEND  =  NRAN(NODUM) 
DO  22  I=l,NRAN(NODUM) 
N  =  NVECa,NCOLRD) 
DO  23  J  =  NRAN(NODUM)+  l,NCOLRD 
K=K+1 
M  =  N+J-I+1 
P(M)=D(K) 
23      CONTINUE 
22    CONTINUE 

DO  25  I  =  NRAN(NODUM)+l,NCOLRD 
K=NVEC(I,NCOLRD) 
II=I-NRAN(NODUM) 
M  =  NVEC(II,NCOLX) 
D0  26J  =  I,NCOLRD 
K=K+1 
M=M+1 
P(K)=TK(M) 
26     CONTINUE 
25    CONTINUE 

c************************************************************** 

C      P=(R|  |D)//(TD|  jTK) 

CALL  VECSWP(P,NCOLRD,NCOLRD,  1  ,NRAN(NODUM)) 
K=0 

D0  226I=l,NCOLX 
II  =  I  +  NRAN(NODUM) 
M  =  NVEC(II,NCOLRD) 
DO  227  J  =  I,NCOLX 
K=K+1 
M  =  M+1 
TK(K)  =  P(M) 
227     CONTINUE 
226  CONTINUE 
1300  CONTINUE 
K  =  0 
D0  826I=l,NCOLX 
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II=I  +  NRAN(1) 

M  =  NVEC(II,NCOLRD) 

DO  827  J  =  I,NCOLX 

K=K+1 

M  =  M+1 

TK(K)  =  P(M) 

827  CONTINUE 
826  CONTINUE 

NDFIX  =  NCLFIX 

CALL  VC2SWP(TK,NCOLX,NCOLX,  1  ,NCLFIX,NDFIX) 
P****************************************************************** 

C     PORTIONS  OF  TK  ARE  SELECTED  AND  MULTIPLIED  AND  THE  TRACE  CALCU- 

C  LATED  TO  FORM  VQVQ 

^ ********************************************************** ****** 

£*****************CQLUMN  1  TO  NORAN-1  OF  VQVQ********************* 
NEND=1  +  NCLFIX 
DO  841  J  =1, NORAN-1 
NSTA  =  NEND+1 
NEND  =  NSTA  +  NRAN(J)-1 
TR=0.0 

NSTAK=NEND+1 
DO  838  1=  J,NORAN-l 
IFa.EQ.J)  THEN 
D0  828II  =  NSTA,NEND 
N=NVEC(II,NCOLX) 
DO  830  K= II, NEND 
M  =  N  +  K-II  +  1 
IF(ILEQ.K)  THEN 
TR=TR  +  TK(M)*TK(M) 
GO  TO  830 
ENDIF 

TR=TR  +  2*TK(M)*TK(M) 
830       CONTINUE 

828  CONTINUE 
VQVQ(J,I)=TR 
GO  TO  838 

ENDIF 

NENDK  =  NSTAK  +  NRAN(I)-1 
TR=0.0 

DO  833  L  =  NSTA,NEND 
N  =  NVEC(L,NCOLX) 
DO  835  K  =  NSTAK,NENDK 
M  =  N  +  K-L+1 
TR=TR  +  TK(M)*TK(M) 
835       CONTINUE 
833      CONTINUE 

NSTAK=NENDK+1 
VQVQ(J,I)=TR 
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838     CONTINUE 
841    CONTINUE 

/->***************r,Qi  ijmn  NORAN  OF  VQVO**************************** 
DO  932  I  =1, NORAN- 1 
TRACER(I)=0.0 
DO  933  J  =  I,NORAN-l 

VQVQ(J,I)  =  VQVQ(I,J) 

933  CONTINUE 
932   CONTINUE 

NSTA=2  +  NCLFIX 
DO  935  J  =  l,NORAN-l 
NEND  =  NSTA  +  NRAN(J)-1 
D0  934I  =  NSTA,NEND 

N  =  NVEC(I,NCOLX) 

N  =  N+1 

TRACER(J)=TRACER(J)  +  TK(N) 

934  CONTINUE 
NSTA  =  NEND  +1 

935  CONTINUE 

D0  938I=l,NORAN-l 
VQVQ(I,NORAN)=TRACER(I) 
938   CONTINUE 
SUB=0.0 

D0  936I=l,NORAN-l 
SUB  =  SUB  +  TRACER(I)*SIG(I) 
DO  937  J=l,NORAN-l 

VQVQ(I,NORAN)  =  VQVQ(I,NORAN)-(SIG(J)*VQVQ(I,J)) 
937      CONTINUE 

VQVQ(I,NORAN)  =  VQVQ(l,NORAN)/SIG(NORAN) 

936  CONTINUE 
NSTAK=NOBS-NDFIX 
TR  =  FLOAT(NSTAK) 

VQVQ(NORAN,NORAN)  =  (TR-SUB)/SIG(NORAN) 
DO  940  I  =1,  NORAN- 1 

VQVQ(NORAN,NORAN)  =  VQVQ(NORAN,NORAN)-(SIG(I)*VQVQ(I,NORAN)) 

940  CONTINUE 

VQVQ(NORAN,NORAN)  =  VQVQ(NORAN,NORAN)/SIG(NORAN) 
DO  941  I=l,NORAN-l 
VQVQ(NORAN,I)  =  VQVQa,NORAN) 

941  CONTINUE 

c*************FOrming  VECTOR  OF  FIXED  EFFECTS  ESTIMATES********* 
DO  951  I=1,NCLFIX 
N  =  NVEC(I,NCOLX) 
N  =  N  +  NCLFIX-I+2 
BHATG)=TK(N) 
951   CONTINUE 

£************#pQj^]^jp^Q  VECTORS  OF  predicxIONS************** 
DO  952  1=1,9 
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IF(RANNAM(I).EQ.'GCA')  THEN 

NSTA  =  I 

GO  TO  953 
ENDIF 

952  CONTINUE 
GO  TO  955 

953  NEND=0 

D0  954I=1,NSTA-1 
NEND  =  NEND  +  NRANfl) 

954  CONTINUE 
L=NEND+1 

N  =  NVEC(NCLFIX  + 1  ,NCOLX) 

L  =  L+N 

D0  955I=l,NCOLG 

L  =  L+1 

GCA(I)  =  TK(L) 

955  CONTINUE 
DO  962  1=1,9 

IF(RANNAM(I).EQ.'SCA')  THEN 

NSTA  =  I 

GO  TO  963 
ENDIF 

962  CONTINUE 
GO  TO  965 

963  NEND  =  0 

D0  964I=1,NSTA-1 
NEND  =  NEND  +  NRAN(I) 

964  CONTINUE 
L=NEND+1 

N  =  NVEC(NCLFIX  + 1  ,NCOLX) 

L=L+N 

D0  965I=l,NCOLS 

L=L+1 

SCAa)=TK(L) 

965  CONTINUE 

^->*************Pqj^j^Ijjsjq  yoVOY**********************  *********  ******* 
NSTA  =  NCLFIX  +  2 
NEND  =  NSTA  +  NRAN(l)  -1 
N  =  NVEC(NCLFIX  +  1  ,NCOLX) 
D0  926J=l,NORAN-l 
D0  925I  =  NSTA,NEND 

M  =  N  +  I-NCLFIX 

YQVQY(J)  =  YQVQY(J)  +  TK(M)*TK(M) 

925  CONTINUE 
NSTA  =  NEND+1 

NEND  =  NSTA  +  NRAN(J+1)-1 

926  CONTINUE 

NSTA  =  NVEC(NCLFIX  + 1  ,NCOLX)  + 1 
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YQ  VQY(NORAN) = TK(NSTA) 
D0  927I=l,NORAN-l 

YQVQY(NORAN)  =  YQVQY(NORAN)-(SIGG)*YQVQY(I)) 
927   CONTINUE 

YQVQY(NORAN)  =  YQVQY(NORAN)/SIG(NORAN) 

DEALLOCATE  (TK,D,P) 

RETURN 

END 

q**************************************************************** 

C  THIS  FUNCTION  COUNTS  THE  NUMBER  OF  ENTRIES  FOR  AN  EFFECT 
SUBROUTINE  NOCOL(VEC,OBS,VECl,NCOL) 
PARAMETER  ( 
N  NOBSER=5000) 
INTEGER  OBS,NCOL 

CHARACTER*8  VEC(NOBSER),VECl(*),Z,X,NT 
DO  11  I=l,OBS 
IF(I.EQ.1)THEN 

VEC1(I)  =  VEC(I) 

NCOL=l 

GO  TO  1 1 
ENDIF 
DO  12J=l,NCOL 

X=VECa) 
Z=VEC1(J) 
IF(X.EQ.Z)  GO  TO  11 
12      CONTINUE 

NCOL=NCOL  +  1 

VECl(NCOL)  =  VEC(I) 
11     CONTINUE 

DO  159K=l,NCOL-l 

N  =  K+1 
DO  159  J  =  N,NCOL 
IF(VEC1(K).LT.VEC1(J))  GO  TO  159 
NT=VEC1(K) 
VEC1(K)  =  VEC1(J) 
VEC1(J)  =  NT 
159   CONTINUE 
RETURN 
END 

C   THIS  FUNCTION  COUNTS  THE  NUMBER  OF  ENTRIES  FOR  PARENTS 

SUBROUTINE  NOP AR(VEC1,VEC2,0BS,VEC3,NPAR) 

PARAMETER  ( 
N  NOBSER=5000, 
N  NOGCA  =  50) 

INTEGER  OBS,NPAR 

CHARACTER*8  VECl(NOBSER),VEC2(NOBSER),VEC3(NOGCA),Y,Z,X,NT 

DO  11  I=l,OBS 
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IF(I.EQ.1)THEN 

VEC3(I)  =  VEC1(I) 

VEC3(I+1)  =  VEC2(I) 

NPAR=2 

GO  TO  1 1 
ENDIF 
DO  12  J  =  1,NPAR 

X  =  VEC1(I) 
Z=VEC3(J) 
IF(X.EQ.Z)  GO  TO  15 

12  CONTINUE 
NPAR=NPAR  +  1 

VEC3(NPAR)  =  VEC1(I) 
15      D0  13K=1,NPAR 

Y  =  VEC2(I) 
Z  =  VEC3(K) 
IF(Y.EQ.Z)  GO  TO  11 

13  CONTINUE 
NPAR  =  NPAR  +  1 
VEC3(NPAR)  =  VEC2(1) 

1 1     CONTINUE 

DO  159K=1,NPAR-1 

N  =  K+1 
DO  159  J  =  N,NPAR 
IF(VEC3(K).LT.VEC3(J))  GO  TO  159 
NT=VEC3(K) 
VEC3(K)  =  VEC3(J) 
VEC3(J)  =  NT 
159   CONTINUE 
RETURN 
END 

C**VECSWP  PRODUCES  A  G2  INVERSE  OF  A  SYMMETRIC  MATRIX  STORED  AS** 
r^  ***  **  ***************** *  a  vpr^nroR  ********************************* 

£************************************************** **************** 

SUBROUTINE  VECSWP(VEC,NROWX,NCOLXX,NSTA,NEND) 

PARAMETER  ( 
N  NOBSER  =  5000, 
N  NOBL  =  36, 
N  NOCR=75, 
NNOBH=200, 
N  NOGCA  =  50, 
NNOX=1400, 
NNOCBS=1000, 
N  NTOT  =  NOX  +  NOCBS, 
N  NIZED  =  NOX*NOCBS, 
N  NIXPX  =  ((NOX*(NOX-l))/2)  +  NOX, 
N  NSIP  =  NOX  +  NOCBS, 
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NNIZEP=((NSIP*(NSIP-l))/2)  +  NSIP) 
DIMENSION  VEC(*),V(:) 
ALLOCATABLE  V 

INTEGER  NROWX,NCOLXX,NSTA,NEND,NUMB,NVEC,V,NUMl,NUM2 
DOUBLE  PRECISION  VEC,DMIN,D,B,C 
ALLOCATE  (V(NCOLXX)) 
DMIN=1.0D-8 
D0  9I=l,NCOLXX 

va>=i 

9      CONTINUE 

DO  10K=NSTA,NEND 
NUM2  =  -(K*(K-3))/2  +  NCOLXX*(K-l) 
NUMB  =  NUM2-1 
D  =  VEC(NUM2) 
IF  (DABS(D).LE.DMIN)  THEN 
D0  22I=1,K 
IF(I.EQ.K)  THEN 

NUM2  =  -a*a-3))/2  +  NCOLXX*a-l) 
GO  TO  53 
ENDIF 

NUM2  =  -(I*(I-l))/2  +  K+NCOLXX*(M) 
53         VEC(NUM2)  =  0.0 
22        CONTINUE 

NUM2=NUMB+1 
DO  21  J=K+l,NCOLXX 
NUM2  =  NUM2+1 
VEC(NUM2)=0.0 
21         CONTINUE 
GO  TO  10 
ENDIF 

D0  23I=l,NROWX 
IFfl.EQ.K)  GO  TO  23 
NUM1  =NVEC(I,NCOLXX) 
IF(I.LT.K)  THEN 
NUM2  =  NUM1  +K-I+1 
B  =  VEC(NUM2)/D 
GO  TO  27 
ENDIF 

NUM2  =  NUMB  +  I-K+1 

B  =  (FLOAT(V(I))*FLOAT(V(K))*VEC(NUM2))/D 
27       IF(DABS(B).LT.(1 .0D-20))  GO  TO  23 
D0  24J  =  I,NCOLXX 
IF(J.EQ.K)  GO  TO  24 
IF(K.LT.J)  THEN 
NUM2  =  NUMB+J-K+1 
C  =  VEC(NUM2) 
GO  TO  28 
ENDIF 
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NUM2=-(J*(M))/2  +  K+NCOLXX*(J-l) 

C  =  FL0AT(V(J))*FL0AT(V(K))*VEC(NUM2) 
28       IF(DABS(C).LT.(1  .OD-20))  GO  TO  24 
NUM2  =  NUM1  +  J  -I  +1 
VEC(NUM2)  =  VEC(NUM2)-(B*C) 

24  CONTINUE 
23      CONTINUE 

D0  26J  =  K,NCOLXX 

NUM2  =  NUMB  +J-K+1 

VEC(NUM2)  =  VEC(NUM2)/D 
26       CONTINUE 

D0  25I=1,K 

IFfl.EQ.K)  THEN 

NUM2  =  -(I*fl-3))/2  +  NCOLXX*(I- 1) 
GO  TO  54 

ENDIF 

NUM2  =  -(I*(I-l))/2  +  K  +  NCOLXX*(I-l) 
54       VEC(NUM2)  =  -VEC(NUM2)/D 

25  CONTINUE 
VEC(NUMB+1)=1/D 
V(K)  =  -V(K) 

10    CONTINUE 

DEALLOCATE  (V) 
RETURN 

END 

p****************************************************************** 

C**VC2SWP  PRODUCES  A  G2  INVERSE  OF  A  SYMMETRIC  MATRIX  STORED  AS** 
/-<***********************  a  VECTOR  ********************************* 
/-■****************************#************************************* 

SUBROUTINE  VC2SWP(VEC,NROWX,NCOLXX,NSTA,NEND,NDF) 

PARAMETER  ( 
N  NOBSER  =  5000, 
N  NOBL=36, 
N  NOCR  =  75, 
N  NOBH  =  200, 
N  NOGCA  =  50, 
NNOX=1400, 
N  NOCBS=1000, 
N  NTOT=NOX  +  NOCBS, 
N  NIZED  =  NOX*NOCBS, 
N  NIXPX  =  ((NOX*(NOX-l))/2)  +  NOX, 
N  NSIP  =  NOX  +  NOCBS, 
NNIZEP=((NSIP*(NSIP-l))/2)  +  NSIP) 

DIMENSION  VEC(*),V(:) 

ALLOCATABLE  V 

INTEGER  NROWX,NCOLXX,NSTA,NEND,NUMB,NVEC,V,NUMl,NUM2,NDF 

DOUBLE  PRECISION  VEC,DMIN,D,B,C 

DMIN=1.0D-8 
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ALLOCATE  (V(NCOLXX)) 
D0  9I=l,NCOLXX 

va)=i 

9      CONTINUE 

DO  10  K  =  NSTA,NEND 

NUM2  =  -(K*(K-3))/2  +  NCOLXX*(K-l) 
NUMB  =  NUM2-1 
D  =  VEC(NUM2) 
IF  (DABS(D).LE.DMIN)  THEN 
NDF  =  NDF-1 
DO  22  1=1, K 
IF(LEQ.K)  THEN 

NUM2  =  -(I*(I-3))/2  +  NCOLXX*(I-l) 
GO  TO  53 
ENDIF 

NUM2  =  -(I*(I-l))/2  +  K  +  NCOLXX*(I-l) 
53         VEC(NUM2)  =  0.0 
22        CONTINUE 

NUM2=NUMB+1 
DO  21  J  =  K+l,NCOLXX 
NUM2  =  NUM2+1 
VEC(NUM2)  =  0.0 
21         CONTINUE 
GO  TO  10 
ENDIF 

D0  23I=l,NROWX 
IF(I.EQ.K)  GO  TO  23 
NUMl  =  NVEC(I,NCOLXX) 
IF(I.LT.K)  THEN 
NUM2  =  NUM1  +K-I+1 
B  =  VEC(NUM2)/D 
GO  TO  27 
ENDIF 

NUM2  =  NUMB  +  I-K+1 
B  =  (FLOAT(V(I))*FLOAT(V(K))*VEC(NUM2))/D 

27  IF(DABS(B).LT.(1.0D-20))  GO  TO  23 
D0  24J  =  I,NCOLXX 

IF(J.EQ.K)  GO  TO  24 
IF(K.LT.J)  THEN 
NUM2  =  NUMB+J-K+1 

C  =  VEC(NUM2) 

GO  TO  28 
ENDIF 

NUM2  =  -(J*(J-l))/2  +  K  +  NCOLXX*(J-l) 
C  =  FLOAT(V(J))*FLOAT(V(K))*VEC(NUM2) 

28  IF(DABS(C).LT.(1.0D-20))  GO  TO  24 
NUM2  =  NUM1  +  J  -I  +1 
VEC(NUM2)  =  VEC(NUM2)-(B*C) 
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24  CONTINUE 
23      CONTINUE 

D0  26J  =  K,NCOLXX 
NUM2  =  NUMB  +J-K+1 
VEC(NUM2)  =  VEC(NUM2)/D 
26       CONTINUE 
DO  25  1=1, K 
IF(LEQ.K)  THEN 

NUM2  =  -(I*(I-3))/2  +  NCOLXX*(M) 

GO  TO  54 
ENDIF 

NUM2  =  -(I*(M))/2  +  K  +  NCOLXX*(I-l) 
54       VEC(NUM2)  =  -VEC(NUM2)/D 

25  CONTINUE 
VEC(NUMB+1)=1/D 
V(K)  =  -V(K) 

10    CONTINUE 

DEALLOCATE  (V) 

RETURN 

END 

C******NVEC  COUNTS  THE  PROPER  POSITION  OF  AN  ELEMENT******* 
C*********IN  THE  HALF  STORED  MATRIX  (AS  A  VECTOR)********** 
C*******ACCORDING  TO  ITS  NORMAL  ROW  COLUMN  POSITION******** 
o*****************jm  tup  ORIGINAL  MATRIX******************* 

FUNCTION  NVEC(NROWS,NCOLXX) 
INTEGER  NROWS,NCOLXX,NVEC 
M  =  0 
DO  3  I=l,NROWS 

IFa.EQ.l)  GO  TO  3 

M  =  M  +  NCOLXX-(I-2) 
3      CONTINUE 
NVEC  =  M 
RETURN 
END 

SUBROUTINE  XPRIMX(TEST,BLOCK,SET,F,M,FM) 

PARAMETER  ( 
N  NOBSER  =  5000, 
N  NOBL=36, 
N  NOCR=75, 
N  NOBH  =  200, 
N  NOGCA  =  50, 
NNOX=1400, 
N  NOCBS=1000, 
N  NTOT=NOX  +  NOCBS, 
N  NIZED  =  NOX*NOCBS, 
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N  NIXPX  =  ((N0X*(N0X-l))/2)  +  N0X, 
N  NSIP  =  NOX  +  NOCBS, 
N  NIZEP = ((NSIP*(NSIP- 1  ))/2)  +  NSIP) 

C0MM0N/CMN1/  NCOLT,NCOLTB,NCOLG,NCOLS,NCOLGT,NCOLST,NOBS, 
NNCOLB,NCOLX,NCOLCB,NCL(9),NORAN,NOFIX,NCLFIX, 
N  NCLRAN,NCOLSE,NRAN(9) 
COMMON/CMN2/ 

N  YQVQY(9),VQVQ(9,9),MEAN(NOBSER),SIG(9),GCA(NOGCA), 
N  BHAT(NOBH),SCA(NOCR) 

COMMON/CMN3/DTERM(8,2),RANNAM(9),DUM2,FMVEC(NOCR), 
NPARENT(NOGCA),LOCO(10),REP(NOBL),DISSET(10) 
DIMENSION  X(:,:),DBLOCK(:,:),LOC(5,2), 
N  NULVEC(NOBSER),XPX(:) 
ALLOCATABLE  ::  DBLOCK,XPX,X 

INTEGER  X,DBLOCK,NCOLT,NCOLTB,NCOLG,NCOLS,NCOLGT,NCOLST, 
NNOBS,NCOLB,NCOLX,NCOLCB,NUMl,NCL,NORAN,NOFIX, 
NNCLFIX,NCLRAN,NCOLSE,MLV,LOC,NRAN,NMISS,NULVEC 
DOUBLE  PRECISION  XPX,YQVQY,VQVQ,MEAN,SIG,ZIP,ZAP,GCA,BHAT,SCA 
CHARACTER*  1  DTERM,DUM2 

CHARACTER*8PARENT,LOCO,REP,DISSET,TEST(NOBSER),BLOCK(NOBSER), 
NSET(NOBSER),F(NOBSER),M(NOBSER) 
CHARACTER*16  FMVEC,FM(NOBSER) 
CHARACTER*  11  RANNAM 

ALLOCATE  (X(NOBS,NCOLX),DBLOCK(NOBS,NCOLB)) 
PRINT*,  '  ********FORMING  THE  DESIGN  MATRIX**********' 
J=0 

DO  12001=1,8 
IF((NCL(I).GT.0).AND.(DTERM(I,2).EQ.'R'))  THEN 
J=J  +  1 

NRAN(J)  =  NCL(I) 
ENDIF 
1200  CONTINUE 
DO  47  1=1, NOBS 
DO  127K=l,NCOLB 
DBLOCK(I,K)  =  0 
127     CONTINUE 

DO  48  J=l,NCOLX 

xa,j)=o 

48      CONTINUE 
47     CONTINUE 

DO  31  I=l,NOBS 

X(I,1)=1 
31     CONTINUE 
MLV=1 

IF((DTERM(l,l).EQ.'N').OR.(DTERM(l,2).EQ.'R'))  GOTO  1101 
DO  1001  1=1, NOBS 
C       FORMING  DESIGN  MATRIX  FOR  TEST 
DO5504J  =  l,NCOLT 
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IF(TEST(I).EQ.LOCO(J))  THEN 
NJ=J  +  MLV 
GO  TO  5505 
ENDIF 

5504  CONTINUE 

5505  X(I,NJ)  =  1 

1001  CONTINUE 
LOC(l,l)  =  MLV+l 
MLV  =  MLV  +  NCOLT 
LOC(l,2)  =  MLV 

C       FORMING  DESIGN  MATRIX  FOR  BLOCK 
1101  IF((DTERM(2,l).EQ.'N').OR.(DTERM(2,2).EQ.'R'))  GOTO  1102 
DO  1002  1=1,  NOBS 
DO  5501  J=l,NCOLB 
IF(BLOCK(I).EQ.REP(J))  THEN 
NK=J 

GO  TO  5502 
ENDIF 

5501  CONTINUE 

5502  DBLOCK(I,NK)  =  1 

1002  CONTINUE 
NSTA  =  LOC(l,l) 
NEND  =  LOC(l,2) 
IF(DTERM(1,1).EQ.'N')  THEN 

NSTA=1 
NEND=1 
ENDIF 

DO  136  1=1, NOBS 
L=MLV+1 

DO  137J  =  NSTA,NEND 
DO  138K=l,NCOLB 
X(I,L)  =  X(I,J)*DBLOCK(I,K) 
L=L+1 
138       CONTINUE 
137      CONTINUE 
136    CONTINUE 

LOC(2,l)  =  MLV+l 
MLV  =  MLV  +  NCOLTB 
LOC(2,2)  =  MLV 
1102IF((DTERM(3,1).EQ.'N').OR.(DTERM(3,2).EQ.'R'))  GOTO  1103 
DO  1003  1=1,  NOBS 
DO5506J=l,NCOLSE 
IF(SET(I).EQ.DISSET(J))  THEN 
NK=J  +  MLV 
GO  TO  5507 
ENDIF 

5506  CONTINUE 

5507  X(I,NK)=1 


137 


1003   CONTINUE 

LOC(3,l)  =  MLV+l 
MLV  =  MLV  +  NCOLSE 
LOC(3,2)  =  MLV 
1103   MLV  =  MLV+1 

IF((DTERM(l,l).EQ.'N').OR.(DTERM(l,2).EQ.'F'))  GO  TO  2101 
DO  2001  I=l,NOBS 
C       FORMING  DESIGN  MATRIX  FOR  TEST 
DO5508J  =  l,NCOLT 
IF(TEST(I).EQ.LOCO(J))  THEN 
NJ=J  +  MLV 
GO  TO  5509 
ENDIF 

5508  CONTINUE 

5509  X(I,NJ)  =  1 

2001  CONTINUE 
LOC(l,l)  =  MLV+l 
MLV  =  MLV  +  NCOLT 
LOC(l,2)  =  MLV 

C       FORMING  DESIGN  MATRIX  FOR  BLOCK 
2101  IF((DTERM(2,l).EQ.'N').OR.(DTERM(2,2).EQ.'F'))  GO  TO  2102 
DO  2002  1=1,  NOBS 

DO5510J=l,NCOLB 

IF(BLOCK(I).EQ.REP(J))  THEN 

NK=J 

GO  TO  5511 

ENDIF 

5510  CONTINUE 

5511  DBLOCK(I,NK)  =  l 

2002  CONTINUE 
NSTA  =  LOC(l,l) 
NEND  =  LOC(l,2) 
IF(DTERM(1,1).EQ.'N')  THEN 

NEND=1 
NSTA=1 
ENDIF 

DO  36  1=1, NOBS 
L=MLV+1 

D0  37J  =  NSTA,NEND 
D0  38K=l,NCOLB 
X(I,L)  =  X(I,J)*DBLOCK(I,K) 
L=L+1 
38       CONTINUE 
37      CONTINUE 
36    CONTINUE 

LOC(2,l)  =  MLV+l 
MLV  =  MLV  +  NCOLTB 
LOC(2,2)  =  MLV 
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2102  IF((DTERM(3,l).EQ.'N').OR.(DTERM(3,2).EQ.'F'))  GO  TO  2103 
DO  2003  1=1,  NOBS 

D0  5512J=l,NCOLSE 
IF(SET(I).EQ.DISSET(J))  THEN 

NK=J  +  MLV 

GO  TO  5513 

ENDIF 

5512  CONTINUE 

5513  X(I,NK)=1 

2003  CONTINUE 
LOC(3,l)  =  MLV+l 
MLV  =  MLV  +  NCOLSE 
LOC(3,2)  =  MLV 

C       FORMING  DESIGN  MATRIX  FOR  GCA 

2103  IF(DTERM(4,1).EQ.'N')  GO  TO  2104 
DO  2004  1=1,  NOBS 

D0  5514J=l,NCOLG 
IF(F(I).EQ.PARENT(J))  THEN 
NL=J  +  MLV 
GO  TO  5515 
ENDIF 

5514  CONTINUE 

5515  X(I,NL)  =    1 
IF(DUM2.EQ.'H')  GO  TO  2004 
D0  5516K=l,NCOLG 

IF(Ma).EQ.PARENT(K))  THEN 

NN=K+MLV 

GO  TO  5517 

ENDIF 

5516  CONTINUE 

5517  X(I,NN)  =    1 

2004  CONTINUE 
LOC(4,l)  =  MLV+l 
MLV  =  MLV  +  NCOLG 
LOC(4,2)  =  MLV 

2104  IF(DTERM(5,1).EQ.'N')  GO  TO  2105 
NSTA  =  MLV 

DO  34  1=1, NOBS 

DO  35  J=l,NCOLS 

IF(FMa).EQ.FMVEC(J))  THEN 

XG,J  +  NSTA)=1 

GO  TO  34 

ENDIF 
35     CONTINUE 
34     CONTINUE 

LOC(5,l)  =  MLV+l 
MLV  =  MLV  +  NCOLS 
LOC(5,2)  =  MLV 
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2105  IF((DTERM(6,l).EQ.'N').OR.(DTERM(l,l).EQ.'N'))  GO  TO  2106 
NSTA=  LOC(l,l) 

NEND  =  LOC(l,2) 
NSTAK=  LOC(4,l) 
NENDK  =  LOC(4,2) 
DO  49  1  =  1, NOBS 

L  =  MLV+1 

D0  39J  =  NSTA,NEND 
DO  40  K  =  NSTAK,NENDK 

xa,L)=xa,j)*xo,K) 

L=L+1 

40  CONTINUE 
39      CONTINUE 
49    CONTINUE 

MLV  =  MLV  +  NCOLGT 

2106  IF((DTERM(7,l).EQ.'N').OR.(DTERM(l,l).EQ.'N'))  GO  TO  2107 
NSTAK  =  LOC(5,l) 

NENDK  =  LOC(5,2) 
DO  41  I=l,NOBS 
L=MLV+1 

DO  42  J  =  NSTA,NEND 

DO  43  K  =  NSTAK,NENDK 

xa,L)=xa,j)*xa,K) 

L=L+1 

43  CONTINUE 
42      CONTINUE 

41  CONTINUE 

MLV  =  MLV  +  NCOLST 

2107  IF((DTERM(8,l).EQ.'N').OR.(DTERM(2,l).EQ.'N'))  GO  TO  2108 
NSTA  =  LOC(2,l) 

NEND  =  LOC(2,2) 
NSTAK=LOC(5,l) 
NENDK  =  LOC(5,2) 
IF(DUM2.EQ.'H')  THEN 

NSTAK  =  LOC(4,l) 

NENDK  =  LOC(4,2) 
ENDIF 

DO  44  1=1, NOBS 
L  =  MLV+1 

DO  45  J  =  NSTA,NEND 

DO  46  K  =  NSTAK,NENDK 

xa,L)=xa,j)*xa,K> 

L  =  L+1 
46       CONTINUE 
45      CONTINUE 

44  CONTINUE 

CX  =  MU|  |HT||T|  | TB|  |G|  |S|  |GT|  | ST |  |CB  COMPLETED 
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DEALLOCATE  (DBLOCK) 

PRINT*,  '*******FINISHED  FORMING  THE  DESIGN  MATRIX**********' 
PRINT*,  '*******NOW  CHECKING  FOR  NULL  COLUMNS***************' 
2108NEND  =  NCLFIX+1 
NMISS=0 

DO  3001  K=l,NORAN-l 
NSTA  =  NEND+1 
NEND  =  NSTA  +  NRAN(K)- 1 
DO3002J  =  NSTA,NEND 
DO  3003  1=1,  NOBS 
IF(X(I,J).NE.0)  GO  TO  3002 
3003    CONTINUE 

NRAN(K)  =  NRAN(K)-1 
NMISS  =  NMISS+1 
NULVEC(NMISS)=J 
3002  CONTINUE 
3001  CONTINUE 

PRINT*,'***********FINISHED  CHECKING  FOR  NULL  COLUMNS*********' 
WRITE(6,3006)  NMISS 
3006  FORMATC  THERE  WERE  ',14,'  NULL  COLUMNS') 
IF(NMISS.EQ.O)  GO  TO  301 1 

PRINT  */***********NOW  DELETING  NULL  COLUMNS****************' 
NULVEC(NMISS+  l)  =  NCOLX+  1 
L=NULVEC(1) 
DO  3021  1=1, NMISS 

IF((NULVEC(I+  l)-NULVEC(I)).EQ.l)  GO  TO  3021 
DO  3022  J  =  NULVEC(I)+1,NULVEC(I+1)-1 
DO  3023  K=l, NOBS 
X(K,L)  =  X(K,J) 
3023     CONTINUE 

L  =  L+1 
3022  CONTINUE 
3021  CONTINUE 
3011  NCLRAN  =  NCLRAN-NMISS 
NCOLX  =  NCOLX-NMISS 
NUMl  =  (NCOLX*(NCOLX-l))/2  +  NCOLX 
ALLOCATE  (XPX(NUMl)) 
DO  10I=1,NUM1 
XPXa)  =  0.0 
10     CONTINUE 

PRINT*,'**********FORMING  DOT  PRODUCTS  OF  DESIGN  COLUMNS*******' 
DO  15I=l,NCOLX 
N  =  NVECa,NCOLX) 
DO  16J  =  I,NCOLX 
N  =  N+1 

DO  17K=l,NOBS 
XPX(N)  =  XPX(N)  +  (FLOAT(X(K,I))*FLOAT(X(K,J))) 
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17       CONTINUE 
16      CONTINUE 
15     CONTINUE 

PRINT*,'********FORMING  DOT  PRODUCTS  OF  DESIGN  COLUMNS  AND  THE  D 
NATA  VECTOR********' 
L=NCLFIX+1 
D0  6J=l,NCOLX 
IF(J.LE.NCLFIX)  THEN 
N=NVEC(J,NCOLX) 
N  =  N  +  NCLFIX  +  2-J 
ENDIF 

IF  (J.GT.NCLFIX)  THEN 
N  =  NVEC(L,NCOLX) 
N  =  N+J-NCLFIX 
ENDIF 

D0  7K=l,NOBS 
ZAP=FLOAT(X(K,J)) 
ZIP=MEAN(K) 
IF(J.EQ.L)  ZAP  =  MEAN(K) 
XPX(N)  =  XPX(N)  +  (ZIP*ZAP) 
7        CONTINUE 
6       CONTINUE 

PRINT*, '*******ALL  DOT  PRODUCTS  HAVE  NOW  BEEN  FORMED********' 

PRINT*,'***SAVING  X  PRIME  X  MATRIX  FOR  FUTURE  ITERATIONS****' 

WRITE(13)  XPX 

PRINT*,'*********X  PRIME  X  IS  STORED*********' 

DEALLOCATE  (X,XPX) 

RETURN 

END 

£*****************  j.jp^j-jgp^gQj^'Q  ^LQQpTTJJjyj*************************** 

C***********MODIFIED  TO  OUTPUT  VARIANCE  COVARIANCE**************** 
^****************jyj^Yl^j^  Qp  predictions  ********************  ******** 

SUBROUTINE  VARX(VARG,VARBH) 

PARAMETER  ( 
N  NOBSER  =  5000, 
N  NOBL=36, 
N  NOCR=75, 
N  NOBH  =  200, 

N  NVARBH  =  (NOBH*(NOBH-l))/2  +  NOBH, 
N  NOGCA  =  50, 

N  NOVARG  =  (NOGCA*(NOGCA-l))/2  +  NOGCA, 
NNOX=1400, 

N  NIXPX  =  (NOX*(NOX-l))/2  +NOX, 
NNOCBS  =  1000, 
N  NTOT=NOX  +  NOCBS, 
N  NIZED  =  NOX*NOCBS, 
N  NSIP  =  NOX  +  NOCBS, 
N  NIZEP  =  ((NSIP*(NSIP- 1  ))/2)  +  NSIP) 
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C0MM0N/CMN1/  NCOLT,NCOLTB,NCOLG,NCOLS,NCOLGT,NCOLST,NOBS, 
N  NCOLB,NCOLX,NCOLCB,NCL(9),NORAN,NOFIX,NCLFIX, 
N  NCLRAN,NCOLSE,NRAN(9) 
COMMON/CMN2/ 

NYQVQY(9),VQVQ(9,9),MEAN(NOBSER),SIG(9),GCA(NOGCA), 
N  BHAT(NOBH),SCA(NOCR) 

COMMON/CMN3/  DTERM(8,2),RANNAM(9),DUM2,FMVEC(NOCR), 
NPARENT(NOGCA),LOCO(10),REP(NOBL),DISSET(10) 
DIMENSION  TK(:),D(:),VARG(NOVARG),VARBH(NVARBH), 
N  NSIG(9,2),XPX(:) 
ALLOCATABLE  ::  TK,D,XPX 

INTEGER  NCOLT,NCOLTB,NCOLG,NCOLS,NCOLGT,NCOLST,NOBS, 
N  NCOLB,NCOLX,NCOLCB,NCL,NORAN,NOFIX,NCLFIX,NSIG,NCOLTK, 
NNCLRAN,NCOLSE,NRAN,NSTA,NEND,NSTAK,NENDK,NCOLD,NOZERO, 
N  NUM1 

DOUBLE  PRECISION  YQVQY,VQVQ,MEAN,SIG,GCA,BHAT,SCA,TK,D, 
N  VARG,VARBH,XPX 
CHARACTER*  1  DTERM,DUM2 
CHARACTER*  16  FMVEC 
CHARACTER*  11  RANNAM 
CHARACTER*8LOCO,PARENT,DISSET,REP 
NUMl=(NCOLX*(NCOLX-l))/2  +  NCOLX 
ALLOCATE  (XPX(NUMl),D(NCOLX)) 
READ(13)  XPX 
K=0 

NOZERO=0 
NCOLTK  =  NCLFIX 
NCOLD  =  NCLFIX+l 
D0  22I=l,NORAN-l 
NCOLD  =  NCOLD  +  NRAN(I) 
IF(SIG(I).EQ.0.0)  THEN 
NOZERO=NOZERO+l 
NSIG(NOZERO,  1 )  =  NCOLD  +  1-NRAN(I) 
NSIG(NOZERO,2)  =  NCOLD 
GO  TO  22 
ENDIF 

NCOLTK  =  NCOLTK  +  NRAN(I) 
DO  21  J=1,NRAN(I) 
K  =  K+1 

D(K)  =  SIG(I) 

21  CONTINUE 

22  CONTINUE 
ALLOCATE  (TK(NUMl)) 
K  =  0 

DO  302  1=1,  NCOLX 

IFa.EQ.(NCLFIX+  1))  GO  TO  302 
DO  23  L=l,NOZERO 

IF((LGE.NSIG(L,1)).AND.(LLE.NSIG(L,2)))  GO  TO  302 
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23  CONTINUE 

N  =  NVEC(I,NCOLX) 
DO  301  J  =  I,NCOLX 

IF(J.EQ.(NCLFIX+1))  GO  TO  301 

D0  24L=l,NOZERO 

IF((J.GE.NSIG(L,1)).AND.(J.LE.NSIG(L,2)))  GO  TO  301 

24  CONTINUE 
NN  =  N+M+1 
K=K+1 
TK(K)=XPX(NN)/SIG(NORAN) 

301  CONTINUE 

302  CONTINUE 
K=0 

DO  28  I  =  NCLFIX +l,NCOLTK 
J  =  NVEC(I,NCOLTK) 
N=J  +  1 
K=K+1 

TK(N)=TK(N)+  (l.D0/(D(K))) 
28     CONTINUE 

DEALLOCATE  (D,XPX) 
£**************FniJATIONS  fj^yg  NOW  BEEN  FORMED************************** 
CALL  VECSWP(TK,NCOLTK,NCOLTK,  1  ,NCOLTK) 
DO  952  1  =  1,9 
IF(RANNAM(I).EQ.'GCA')  THEN 
NSTA=I 
GO  TO  953 
ENDIF 

952  CONTINUE 

953  NEND=0 

D0  954I=1,NSTA-1 
IF(SIGa).EQ.O.O)  GO  TO  954 
NEND  =  NEND  +  NRAN(I) 

954  CONTINUE 

NSTAK = NEND  +  NCLFIX  + 1 
NENDK  =  NSTAK  +  NRAN(NSTA)- 1 
N=0 

DO  955  I  =  NSTAK,NENDK 
K=NVEC(I,NCOLTK) 
D0  956J  =  I,NENDK 
KK=K+J-I+1 
N  =  N+1 

VARG(N)=TK(KK) 
956      CONTINUE 

955  CONTINUE 
N=0 

DO  957  1=1, NCLFIX 
K=NVEC(I,NCOLTK) 
DO  958  J  =  I,NCLFIX 
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KK  =  K+J-I+1 
N  =  N+1 

VARBH(N)=TK(KK) 
958      CONTINUE 
957     CONTINUE 

DEALLOCATE  (TK) 

RETURN 

END 
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